¶ 1

Introduction

Manticore Search is a multi-storage database specifically designed for search, with robust full-text search capabilities.

As an open-source database (available on GitHub), Manticore Search was created in 2017 as a continuation of Sphinx Search engine. Our development team took all the best features of Sphinx and significantly improved its functionality, fixing hundreds of bugs along the way (as detailed in our Changelog). With nearly complete code rewrites, Manticore Search is now a modern, fast, and light-weight database with full features and exceptional full-text search capabilities.

Manticore's key features are:

Powerful and fast full-text searching that works well for small and large datasets

Vector search capabilities

Manticore Search supports the ability to add embeddings generated by your Machine Learning models to each document, and then doing a nearest-neighbor search on them. This lets you build features like similarity search, recommendations, semantic search, and relevance ranking based on NLP algorithms, among others, including image, video, and sound searches.

Multithreading

Manticore Search utilizes a smart query parallelization to lower response time and fully utilize all CPU cores when needed.

Cost-based query optimizer

The cost-based query optimizer uses statistical data about the indexed data to evaluate the relative costs of different execution plans for a given query. This allows the optimizer to determine the most efficient plan for retrieving the desired results, taking into account factors such as the size of the indexed data, the complexity of the query, and the available resources.

Storage options

Manticore offers both row-wise and column-oriented storage options to accommodate datasets of various sizes. The traditional and default row-wise storage option is available for datasets of all sizes - small, medium, and large, while the columnar storage option is provided through the Manticore Columnar Library for even larger datasets. The key difference between these storage options is that row-wise storage requires all attributes (excluding full-text fields) to be kept in RAM for optimal performance, while columnar storage does not, thus offering lower RAM consumption, but with a potential for slightly slower performance (as demonstrated by the statistics on https://db-benchmarks.com/).

Automatic secondary indexes

Manticore Columnar Library uses Piecewise Geometric Model index, which exploits a learned mapping between the indexed keys and their location in memory. The succinctness of this mapping, coupled with a peculiar recursive construction algorithm, makes the PGM-index a data structure that dominates traditional indexes by orders of magnitude in space while still offering the best query and update time performance. Secondary indexes are ON by default for all numeric fields.

SQL-first

Manticore's native syntax is SQL and it supports SQL over HTTP and MySQL protocol, allowing for connection through popular mysql clients in any programming language.

JSON over HTTP

For a more programmatic approach to managing data and schemas, Manticore provides HTTP JSON protocol, similar to that of Elasticsearch.

Elasticsearch-compatible writes

You can execute Elasticsearch-compatible insert and replace JSON queries which enables using Manticore with tools like Logstash (version < 7.13), Filebeat and other tools from the Beats family.

Declarative and imperative schema management

Easily create, update, and delete tables online or through a configuration file.

The benefits of C++ and the convenience of PHP

The Manticore Search daemon is developed in C++, offering fast start times and efficient memory utilization. The utilization of low-level optimizations further boosts performance. Another crucial component, called Manticore Buddy, is written in PHP and is utilized for high-level functionality that does not require lightning-fast response times or extremely high processing power. Although contributing to the C++ code may pose a challenge, adding a new SQL/JSON command using Manticore Buddy should be a straightforward process.

Real-Time inserts

Newly added or updated documents can be immediately read.

Interactive courses for easy learning

We offer free interactive courses to make learning effortless.

Transactions

While Manticore is not fully ACID-compliant, it supports isolated transactions for atomic changes and binary logging for safe writes.

Built-In replication and load balancing

Data can be distributed across servers and data centers with any Manticore Search node acting as both a load balancer and a data node. Manticore implements virtually synchronous multi-master replication using the Galera library, ensuring data consistency across all nodes, preventing data loss, and providing exceptional replication performance.

Built-in backup capabilities

Manticore is equipped with an external tool manticore-backup, and the BACKUP SQL command to simplify the process of backing up and restoring your data. Alternatively, you can use mysqldump to make logical backups.

Out-of-the-box data sync

The indexer tool and comprehensive configuration syntax of Manticore make it easy to sync data from sources like MySQL, PostgreSQL, ODBC-compatible databases, XML, and CSV.

Integration options

You can integrate Manticore Search with a MySQL/MariaDB server using the FEDERATED engine or via ProxySQL.

You can use Apache Superset and Grafana to visualize data stored in Manticore. Various MySQL tools can be used to develop Manticore queries interactively, such as HeidiSQL and DBForge.

Stream filtering made easy

Manticore offers a special table type, the "percolate" table, which allows you to search queries instead of data, making it an efficient tool for filtering full-text data streams. Simply store your queries in the table, process your data stream by sending each batch of documents to Manticore Search, and receive only the results that match your stored queries.

Possible applications:

Manticore has a variety of use cases, including:

Requirements

¶ 2

Read this first

About this manual

The manual is arranged to reflect the most likely way you would use Manticore:

Do not skip 1️⃣ 2️⃣ 3️⃣

Key sections of the manual are marked with 1️⃣, 2️⃣, 3️⃣ etc. in the menu for your convenience since their corresponding functionality is most used. If you are new to Manticore we highly recommend not skipping them.

Quick start guide

If you are looking for a quick understanding of how Manticore works in general ⚡ Quick start guide is a good place to start.

Using examples

Each query example has a little icon 📋 in the top-right corner:

Copy example

You can use it to copy examples to the clipboard. If the query is an HTTP request it will be copied as a CURL command. You can configure the host/port if you press ⚙️.

Search in this manual

We love search and we've made our best to make searching in this manual as convenient as possible. Of course it's backed by Manticore Search. Besides using the search bar which requires opening the manual first there is a very easy way to find something by just opening mnt.cr/your-search-keyword :

mnt.cr quick manual search

Best practices

There are few things you need to understand about Manticore Search that can help you follow the best practices of using it.

Real-time table vs plain table

Real-time mode vs plain mode

Manticore Search works in two modes:

You cannot combine the 2 modes and need to decide which one you want to follow by specifying data_dir in your configuration file (which is the default behaviour). If you are unsure our recommendation is to follow the RT mode as if even you need a plain table you can build it with a separate plain table config and import to your main Manticore instance.

Real-time tables can be used in both RT and plain modes. In the RT mode a real-time table is defined with a CREATE TABLE command, while in the plain mode it is defined in the configuration file. Plain (offline) tables are supported only in the plain mode. Plain tables cannot be created in the RT mode, but existing plain tables made in the plain mode can be converted to real-time tables and imported in the RT mode.

SQL vs JSON

Manticore provides multiple ways and interfaces to manage your schemas and data, but the two main are:

¶ 3

Installation

RHEL, Centos, Alma, Amazon, Oracle
sudo yum install https://repo.manticoresearch.com/manticore-repo.noarch.rpm
sudo yum install manticore manticore-extra
If you are upgrading from an older version, it is recommended to remove your old packages first to avoid conflicts caused by the updated package structure:
sudo yum --setopt=tsflags=noscripts remove manticore*
It won't remove your data. If you made changes to the configuration file, it will be saved to `/etc/manticoresearch/manticore.conf.rpmsave`. If you are looking for separate packages, please find them [here](https://manticoresearch.com/install/#separate-packages). For more details on the installation, see [below](../).
Debian, Ubuntu, Mint
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
sudo dpkg -i manticore-repo.noarch.deb
sudo apt update
sudo apt install manticore manticore-extra
If you are upgrading to Manticore 6 from an older version, it is recommended to remove your old packages first to avoid conflicts caused by the updated package structure:
sudo apt remove manticore*
It won't remove your data or configuration file. If you are looking for separate packages, please find them [here](https://manticoresearch.com/install/#separate-packages). For more details on the installation, see [below](../).
MacOS
brew install manticoresoftware/tap/manticoresearch manticoresoftware/tap/manticore-extra
Please find more details on the installation [below](../).
Windows
1. Download the [Manticore Search Installer](https://repo.manticoresearch.com/repository/manticoresearch_windows/release/x64/manticore-6.0.4-230314-1a3a4ea82-x64.exe) and run it. Follow the installation instructions. 2. Choose the directory to install to. 3. Select the components you want to install. We recommend installing all of them. 4. Manticore comes with a preconfigured `manticore.conf` file in [RT mode](https://manual.manticoresearch.com/#Real-time-mode-vs-plain-mode). No additional configuration is required. For more details on the installation, see [below](../#Installing-Manticore-in-Windows).
Docker
One-liner for a sandbox (not recommended for production use):
docker run -e EXTRA=1 --name manticore --rm -d manticoresearch/manticore && echo "Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time" && until docker logs manticore 2>&1 | grep -q "accepting connections"; do sleep 1; echo -n .; done && echo && docker exec -it manticore mysql && docker stop manticore
This runs Manticore container and waits for it to boot up. After starting, it launches a MySQL client session. When you exit the MySQL client, the Manticore container ceases operation and gets deleted, leaving no stored data behind. For details on how to utilize Manticore in a live production setting, refer to the following section. For production use:
docker run -e EXTRA=1 --name manticore -v $(pwd)/data:/var/lib/manticore -p 127.0.0.1:9306:9306 -p 127.0.0.1:9308:9308 -d manticoresearch/manticore
This setup will enable the Manticore Columnar Library and Manticore Buddy, and run Manticore on ports 9306 for MySQL connections and 9308 for all other connections, using `./data/` as the designated data directory. Read more about production use [in the documentation](https://github.com/manticoresoftware/docker#production-use).
¶ 3.1

Installing Manticore Search as a Docker image

Docker images of Manticore Search are publicly accessible on Docker Hub, built from the Manticore Search docker GitHub repository.

To retrieve the Manticore image, run the following command:

docker pull manticoresearch/manticore

For more information about using Manticore in Docker, see the Using Manticore in Docker section.

¶ 3.2

Installing Manticore packages on RedHat and CentOS

Supported releases:

YUM repository

The simplest method to install Manticore on RedHat/CentOS is by using our YUM repository:

Install the repository:

sudo yum install https://repo.manticoresearch.com/manticore-repo.noarch.rpm

Then install Manticore Search:

sudo yum install manticore manticore-extra

If you are upgrading to Manticore 6 from an older version, it is recommended to remove your old packages first to avoid conflicts caused by the updated package structure:

sudo yum remove manticore*

It won't remove your data and configuration file.

Development packages

If you prefer "Nightly" (development) versions do:

sudo yum -y install https://repo.manticoresearch.com/manticore-repo.noarch.rpm && \
sudo yum -y --enablerepo manticore-dev install manticore manticore-extra manticore-common manticore-server manticore-server-core manticore-tools manticore-executor manticore-buddy manticore-backup manticore-columnar-lib manticore-server-core-debuginfo manticore-tools-debuginfo manticore-columnar-lib-debuginfo  manticore-icudata manticore-galera manticore-galera-debuginfo

Standalone RPM packages

To download standalone RPM files from the Manticore repository, follow the instructions available at https://manticoresearch.com/install/.

More packages you may need

For indexer

If you plan to use indexer to create tables from external sources, you'll need to make sure you have installed corresponding client libraries in order to make available of indexing sources you want. The line below will install all of them at once; feel free to use it as is, or to reduce it to install only libraries you need (for only mysql sources - just mysql-libs should be enough, and unixODBC is not necessary).

sudo yum install mysql-libs postgresql-libs expat unixODBC

In CentOS Stream 8 you may need to run:

dnf install mariadb-connector-c

if you get error sql_connect: MySQL source wasn't initialized. Wrong name in dlopen? trying to build a plain table from MySQL.

Ukrainian lemmatizer

The lemmatizer requires Python 3.9+. Make sure you have it installed and that it's configured with --enable-shared.

Here's how to install Python 3.9 and the Ukrainian lemmatizer in Centos 7/8:

# install Manticore Search and UK lemmatizer from YUM repository
yum -y install https://repo.manticoresearch.com/manticore-repo.noarch.rpm
yum -y install manticore manticore-lemmatizer-uk

# install packages needed for building Python
yum groupinstall "Development Tools" -y
yum install openssl-devel libffi-devel bzip2-devel wget -y

# download, build and install Python 3.9
cd ~
wget https://www.python.org/ftp/python/3.9.2/Python-3.9.2.tgz
tar xvf Python-3.9.2.tgz
cd Python-3.9*/
./configure --enable-optimizations --enable-shared
make -j8 altinstall

# update linker cache
ldconfig

# install pymorphy2 and UK dictionary
pip3.9 install pymorphy2[fast]
pip3.9 install pymorphy2-dicts-uk
¶ 3.3

Installing Manticore in Debian or Ubuntu

Supported releases:

APT repository

The easiest way to install Manticore in Ubuntu/Debian/Mint is by using our APT repository.

Install the repository:

wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
sudo dpkg -i manticore-repo.noarch.deb
sudo apt update

(install wget if it's not installed; install gnupg2 if apt-key fails).

Then install Manticore Search:

sudo apt install manticore manticore-extra

If you are upgrading to Manticore 6 from an older version, it is recommended to remove your old packages first to avoid conflicts caused by the updated package structure:

sudo apt remove manticore*

It won't remove your data and configuration file.

Development packages

If you prefer "Nightly" (development) versions do:

wget https://repo.manticoresearch.com/manticore-dev-repo.noarch.deb && \
sudo dpkg -i manticore-dev-repo.noarch.deb && \
sudo apt -y update && \
sudo apt -y install manticore manticore-extra manticore-common manticore-server manticore-server-core manticore-tools manticore-executor manticore-buddy manticore-backup manticore-columnar-lib manticore-server-core-dbgsym manticore-tools-dbgsym manticore-columnar-lib-dbgsym manticore-icudata-65l manticore-galera manticore-galera-dbgsym

Standalone DEB packages

To download standalone DEB files from the Manticore repository, follow the instructions available at https://manticoresearch.com/install/.

More packages you may need

For indexer

Manticore package depends on zlib and ssl libraries, nothing else is strictly required. However, if you plan to use indexer to create tables from external storages, you'll need to install appropriate client libraries. To find out what specific libraries indexer requires, run it and look at the top of its output:

$ sudo -u manticore indexer
Manticore 3.5.4 13f8d08d@201211 release
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

Built by gcc/clang v 5.4.0,

Built on Linux runner-0277ea0f-project-3858465-concurrent-0 4.19.78-coreos #1 SMP Mon Oct 14 22:56:39 -00 2019 x86_64 x86_64 x86_64 GNU/Linux

Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=xenial -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_POSTGRESQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SOVERSION=31 -DSYSCONFDIR=/etc/manticoresearch

Here you can see mentions of libodbc.so.2, libexpat.so.1, libmysqlclient.so.20, and libpq.so.5.

Below is a reference table with a list of all the client libraries for different Debian/Ubuntu versions:

Distr MySQL PostgreSQL XMLpipe UnixODBC
Ubuntu Trusty libmysqlclient.so.18 libpq.so.5 libexpat.so.1 libodbc.so.1
Ubuntu Bionic libmysqlclient.so.20 libpq.so.5 libexpat.so.1 libodbc.so.2
Ubuntu Focal libmysqlclient.so.21 libpq.so.5 libexpat.so.1 libodbc.so.2
Ubuntu Hirsute libmysqlclient.so.21 libpq.so.5 libexpat.so.1 libodbc.so.2
Ubuntu Jammy libmysqlclient.so.21 libpq.so.5 libexpat.so.1 libodbc.so.2
Debian Jessie libmysqlclient.so.18 libpq.so.5 libexpat.so.1 libodbc.so.2
Debian Buster libmariadb.so.3 libpq.so.5 libexpat.so.1 libodbc.so.2
Debian Bullseye libmariadb.so.3 libpq.so.5 libexpat.so.1 libodbc.so.2
Debian Bookworm libmariadb.so.3 libpq.so.5 libexpat.so.1 libodbc.so.2

To find packages that provide the libraries, you can use, for example, apt-file:

apt-file find libmysqlclient.so.20
libmysqlclient20: /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20
libmysqlclient20: /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20.2.0
libmysqlclient20: /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20.3.6

Note that you only need libraries for the types of storages you're going to use. So if you plan to build tables only from MySQL, then you might need to install only the MySQL library (in the above case libmysqlclient20).

Finally, install the needed packages:

sudo apt-get install libmysqlclient20 libodbc1 libpq5 libexpat1

If you aren't going to use the indexer tool at all, you don't need to find and install any libraries.

To enable CJK tokenization support, the official packages contain binaries with embedded ICU library and include ICU data file. They are independent from any ICU runtime library which might be available on your system, and can't be upgraded.

Ukrainian lemmatizer

The lemmatizer requires Python 3.9+. Make sure you have it installed and that it's configured with --enable-shared.

Here's how to install Python 3.9 and the Ukrainian lemmatizer on Debian and Ubuntu:

# install Manticore Search and UK lemmatizer from APT repository
cd ~
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
sudo dpkg -i manticore-repo.noarch.deb
sudo apt -y update
sudo apt -y install manticore manticore-lemmatizer-uk

# install packages needed for building Python
sudo apt -y update
sudo apt -y install wget build-essential libreadline-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev libffi-dev zlib1g-dev

# download, build and install Python 3.9
cd ~
wget https://www.python.org/ftp/python/3.9.4/Python-3.9.4.tgz
tar xzf Python-3.9.4.tgz
cd Python-3.9.4
./configure --enable-optimizations --enable-shared
sudo make -j8 altinstall

# update linker cache
sudo ldconfig

# install pymorphy2 and UK dictionary
sudo pip3.9 install pymorphy2[fast]
sudo pip3.9 install pymorphy2-dicts-uk
¶ 3.4

Installing Manticore on MacOS

Via Homebrew package manager

brew install manticoresoftware/tap/manticoresearch manticoresoftware/tap/manticore-extra

Start Manticore as a brew service:

brew services start manticoresearch

The default configuration file for Manticore is located at either /usr/local/etc/manticoresearch/manticore.conf or /opt/homebrew/etc/manticoresearch/manticore.conf.

If you plan to use indexer to fetch data from sources such as MySQL, PostgreSQL, or another database using ODBC, you may need additional libraries, such as mysql@5.7, libpq, and unixodbc, respectively.

Development packages

If you prefer "Nightly" (development) versions do:

brew tap manticoresoftware/tap-dev
brew install manticoresoftware/tap-dev/manticoresearch-dev manticoresoftware/tap-dev/manticore-extra-dev
brew services start manticoresearch-dev
¶ 3.5

Installing Manticore in Windows

  1. Download the Manticore Search Installer and run it. Follow the installation instructions.
  2. Choose the directory to install to.
  3. Select the components you want to install. We recommend installing all of them.
  4. Manticore comes with a preconfigured manticore.conf file in RT mode. No additional configuration is required.

Installing as a Windows service

To install searchd (Manticore Search server) as a Windows service, run:

\path\to\searchd.exe --install --config \path\to\config --servicename Manticore

Make sure to use the full path of the configuration file, otherwise searchd.exe will not be able to locate it when it starts as a service.

After installation, the service can be started from the Services snap-in of the Microsoft Management Console.

Once started, you can access Manticore using the MySQL command line interface:

mysql -P9306 -h127.0.0.1

Note that in most examples in this manual, we use -h0 to connect to the local host, but in Windows, you must use localhost or 127.0.0.1 explicitly.

¶ 3.6

Compiling Manticore from source

Compiling Manticore Search from sources enables custom build configurations, such as disabling certain features or adding new patches for testing. For example, you may want to compile from sources and disable the embedded ICU in order to use a different version installed on your system that can be upgraded independently of Manticore. This is also useful if you are interested in contributing to the Manticore Search project.

Building using CI Docker

To prepare official release and development packages, we use Docker and a special building image. This image includes essential tooling and is designed to be used with external sysroots, so one container can build packages for all operating systems. You can build the image using the Dockerfile and README or use an image from Docker Hub. This is the easiest way to create binaries for any supported operating system and architecture. You'll also need to specify the following environment variables when running the container:

To find possible values for DISTR and arch, you can use the directory https://repo.manticoresearch.com/repository/sysroots/roots_with_zstd/ as a reference, as it includes sysroots for all supported combinations.

After that, building packages inside the Docker container is as easy as calling:

cmake -DPACK=1 /path/to/sources
cmake --build .

For instance, to create a package for RHEL7-compatible operating systems that is similar to the official version Manticore Core Team provides, you should execute the following commands in the directory containing the Manticore Search sources. This directory is the root of a cloned repository from https://github.com/manticoresoftware/manticoresearch:

docker run -it --rm -e SYSROOT_URL=https://repo.manticoresearch.com/repository/sysroots \
-e arch=x86_64 \
-e DISTR=rhel7 \
-e boost=boost_rhel_feb17 \
-e sysroot=roots_nov22 \
-v $(pwd):/manticore_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa \
manticoresearch/external_toolchain:clang16_cmake3263 bash

# following is to be run inside docker shell
cd /manticore_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/
mkdir build && cd build
cmake -DPACK=1 ..

cmake --build .
# or if you want to build packages:
# cmake --build . --target package

The long source directory path is required or it may fail to build the sources.

The same process can be used to build binaries/packages not only for popular Linux distributions, but also for FreeBSD, Windows, and macOS.

Building manually

Compiling Manticore without using the building Docker is not recommended, but if you need to do it, here's what you may need to know:

Required tools

Fetching sources

From git

Manticore source code is hosted on GitHub.
To obtain the source code, clone the repository and then check out the desired branch or tag. The branch master represents the main development branch. Upon release, a versioned tag is created, such as 3.6.0 and a new branch for the current release is started, in this case manticore-3.6.0. The head of the versioned branch after all changes is used as source to build all binary releases. For example, to take sources of version 3.6.0 you can run:

git clone https://github.com/manticoresoftware/manticoresearch.git
cd manticoresearch
git checkout manticore-3.6.0

From archive

You can download the desired code from GitHub by using the "Download ZIP" button. Both .zip and .tar.gz formats are suitable.

wget -c https://github.com/manticoresoftware/manticoresearch/archive/refs/tags/3.6.0.tar.gz
tar -zxf 3.6.0.tar.gz
cd manticoresearch-3.6.0

Configuring

Manticore uses CMake. Assuming you are inside the root directory of the cloned repository:

mkdir build && cd build
cmake ..

CMake will investigate available features and configure the build according to them. By default, all features are considered enabled if they are available. The script also downloads and builds some external libraries, assuming that you want to use them. Implicitly, you get support for the maximal number of features.

You can also configure the build explicitly with flags and options. To enable feature FOO add -DFOO=1 to the CMake call.
To disable it, use -DFOO=0. If not explicitly noted, enabling a feature that is not available((such as WITH_GALERA on an MS Windows build)) will cause the configuration to fail with an error. Disabling a feature, apart from excluding it from the build, also disables its investigation on the system and disables the downloading/building of any related external libraries.

Configuration flags and options

Note, that some options are organized in triples: WITH_XXX, DL_XXX and XXX_LIB - like support of mysql, odbc, etc. WITH_XXX determines whether next two have an effect or not. I.e., if you set WITH_ODBC to 0 - there is no sence to provide DL_ODBC and ODBC_LIB, and these two will have no effect if the whole feature is disabled. Also, XXX_LIB has no sense without DL_XXX, because if you don't want DL_XXX option, dynamic loading will not be used, and name provided by XXX_LIB is useless. That is used by default introspection.

Also, using the iconv library assumes expat and is useless if the last is disabled.

Also, some libraries may be always available, and so, there is no sense to avoid linkage with them. For example, in Windows that is ODBC. On macOS that is Expat, iconv, and m.b. others. Default introspection determines such libraries and effectively emits only WITH_XXX for them, without DL_XXX and XXX_LIB, that makes the things simpler.

With some options in game configuring might look like:

mkdir build && cd build
cmake -DWITH_MYSQL=1 -DWITH_RE2=1 ..

Apart general configuration values, you may also investigate file CMakeCache.txt which is left in build folder right after you run configuration. Any values defined there might be redefined explicitly when running cmake. For example, you may run cmake -DHAVE_GETADDRINFO_A=FALSE ..., and that config run will not assume investigated value of that variable, but will use one you've provided.

Specific environment variables

Environment variables are useful for providing some kind of global settings which are stored aside from build configuration and are always present. For persistence, they may be set globally on the system using different ways - like adding them to the .bashrc file, or embedding them into a Dockerfile if you produce a docker-based build system, or writing them in system preferences environment variables on Windows. Also, you may set them short-lived using export VAR=value in the shell. Or even shorter, by prepending values to the cmake call, like CACHEB=/my/cache cmake ... - this way it will only work on this call and will not be visible on the next.

Some of such variables are known to be used in general by cmake and some other tools. That is things like CXX which determines the current C++ compiler, or CXX_FLAGS to provide compiler flags, etc.

However, we have some variables that are specific to manticore configuration, which are invented solely for our builds.

At the end of configuration, you may see what is available and will be used in a list like this one:

-- Enabled features compiled in:

* Galera, replication of tables
* re2, a regular expression library
* stemmer, stemming library (Snowball)
* icu, International Components for Unicode
* OpenSSL, for encrypted networking
* ZLIB, for compressed data and networking
* ODBC, for indexing MSSQL (windows) and generic ODBC sources with indexer
* EXPAT, for indexing xmlpipe sources with indexer
* Iconv, for support of different encodings when indexing xmlpipe sources with indexer
* MySQL, for indexing MySQL sources with indexer
* PostgreSQL, for indexing PostgreSQL sources with indexer

Building

cmake --build . --config RelWithDebInfo

Installation

To install run:

cmake --install . --config RelWithDebInfo

to install into custom (non-default) folder, run

cmake --install . --prefix path/to/build --config RelWithDebInfo

Building packages

For building a package, use the target package. It will build the package according to the selection provided by the -DDISTR_BUILD option. By default, it will be a simple .zip or .tgz archive with all binaries and supplementary files.

cmake --build . --target package --config RelWithDebInfo

Some advanced things about building

Recompilation (update) on single-config

If you haven't changed the path for sources and build, simply move to your build folder and run:

cmake .
cmake --build . --clean-first --config RelWithDebInfo

If by any reason it doesn't work, you can delete file CMakeCache.txt located in the build folder. After this step you
have to run cmake again, pointing to the source folder and configuring the options.

If it also doesn't help, just wipe out your build folder and begin from scratch.

Build types

Briefly - just use --config RelWithDebInfo as written above. It will make no mistake.

We use two build types. For development, it is Debug - it assigns compiler flags for optimization and other things in a way that it is very friendly for development, meaning the debug runs with step-by-step execution. However, the produced binaries are quite large and slow for production.

For releasing, we use another type - RelWithDebInfo - which means 'release build with debug info'. It produces production binaries with embedded debug info. The latter is then split away into separate debuginfo packages which are stored aside with release packages and might be used in case of some issues like crashes - for investigation and bugfixing. Cmake also provides Release and MinSizeRel, but we don't use them. If the build type is not available, cmake will make a noconfig build.

Build system generators

There are two types of generators: single-config and multi-config.

If you want to specify the build type but don't want to care about whether it is a 'single' or 'multi' config generator - just provide the necessary keys in both places. I.e., configure with -DCMAKE_BUILD_TYPE=Debug, and then build with --config Debug. Just be sure that both values are the same. If the target builder is a single-config, it will consume the configuration param. If it is multi-config, the configuration param will be ignored, but the correct build configuration will be selected by the --config key.

If you want RelWithDebInfo (i.e. just build for production) and know you're on a single-config platform (that is all, except Windows) - you can omit the --config flag on the cmake invocation. The default CMAKE_BUILD_TYPE=RelWithDebInfo will be configured then, and used. All the commands for 'building', 'installation' and 'building package' will become shorter then.

Explicitly select build system generators

Cmake is the tool that doesn't perform building by itself, but it generates rules for the local build system.
Usually, it determines the available build system well, but sometimes you might need to provide a generator explicitly. You
can run cmake -G and review the list of available generators.

cmake -G "Visual Studio 16 2019" ....
  ```
- On all other platforms - usually Unix Makefiles are used, but you can specify another one, such as Ninja, or Ninja Multi-Config, as:
  Multi-Config`, as:
```bash
  cmake -GNinja ...
  ```
  or
```bash
  cmake -G"Ninja Multi-Config" ...

Ninja Multi-Config is quite useful as it is really 'multi-config' and available on Linux/macOS/BSD. With this generator, you may shift the choosing of configuration type to build time, and also you may build several configurations in one and the same build folder, changing only the --config param.

Caveats

  1. If you want to finally build a full-featured RPM package, the path to the build directory must be long enough in order to correctly build debug symbols.
    Like /manticore012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789, for example. That is because RPM tools modify the path over compiled binaries when building debug info, and it can just write over existing room and won't allocate more. The aforementioned long path has 100 chars and that is quite enough for such a case.

External dependencies

Some libraries should be available if you want to use them.
- For indexing (indexer tool): expat, iconv, mysql, odbc, postgresql. Without them, you can only process tsv and csv sources.
- For serving queries (searchd daemon): openssl might be necessary.
- For all (required, mandatory!) we need the Boost library. The minimal version is 1.61.0, however, we build the binaries with a fresher version 1.75.0. Even more recent versions (like 1.76) should also be okay. On Windows, you can download pre-built Boost from their site (boost.org) and install it into the default suggested path (i.e. C:\\boost...). On MacOs, the one provided in brew is okay. On Linux, you can check the available version in official repositories, and if it doesn't match requirements, you can build from sources. We need the component 'context', you can also build components 'system' and 'program_options', they will be necessary if you also want to build Galera library from the sources. Look into dist/build_dockers/xxx/boost_175/Dockerfile for a short self-documented script/instruction on how to do it.

On the build system, you need the 'dev' or 'devel' versions of these packages installed (i.e. - libmysqlclient-devel, unixodbc-devel, etc. Look to our dockerfiles for the names of concrete packages).

On run systems, these packages should be present at least in the final (non-dev) variants. (devel variants usually larger, as they include not only target binaries, but also different development stuff like include headers, etc.).

Building on Windows

Apart from necessary prerequisites, you might need prebuilt expat, iconv, mysql, and postgresql client libraries. You have to either build them yourself or contact us to get our build bundle (a simple zip archive where the folder with these targets is located).

See what is compiled

Run indexer -h. It will show which features were configured and built (whether they're explicit or investigated, doesn't matter):

Built on Linux x86_64 by GNU 8.3.1 compiler.

Configured with these definitions: -DDISTR_BUILD=rhel8 -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1
-DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ODBC=1 -DDL_ODBC=1
-DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1
-DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data
-DFULL_SHARE_DIR=/usr/share/manticore
¶ 3.7

Migration from Sphinx Search

Sphinx 2.x -> Manticore 2.x

Manticore Search 2.x maintains compatibility with Sphinxsearch 2.x and can load existing tables created by Sphinxsearch. In most cases, upgrading is just a matter of replacing the binaries.

Instead of sphinx.conf (in Linux normally located at /etc/sphinxsearch/sphinx.conf) Manticore by default uses /etc/manticoresearch/manticore.conf. It also runs under a different user and use different folders.

Systemd service name has changed from sphinx/sphinxsearch to manticore and the service runs under user manticore (Sphinx was using sphinx or sphinxsearch). It also uses a different folder for the PID file.

The folders used by default are /var/lib/manticore, /var/log/manticore, /var/run/manticore. You can still use the existing Sphinx config, but you need to manually change permissions for /var/lib/sphinxsearch and /var/log/sphinxsearch folders. Or, just rename globally 'sphinx' to 'manticore' in system files. If you use other folders (for data, wordforms files etc.) the ownership must be also switched to user manticore. The pid_file location should be changed to match the manticore.service to /var/run/manticore/searchd.pid.

If you want to use the Manticore folder instead, the table files need to be moved to the new data folder (/var/lib/manticore) and the permissions must be changed to user manticore.

Sphinx 2.x / Manticore 2.x -> Manticore 3.x

Upgrading from Sphinx / Manticore 2.x to 3.x is not straightforward, as the table storage engine has undergone a significant upgrade and the new searchd cannot load older tables and upgrade them to the new format on-the-fly.

Manticore Search 3 got a redesigned table storage. Tables created with Manticore/Sphinx 2.x cannot be loaded by Manticore Search 3 without a conversion. Because of the 4GB limitation, a real-time table in 2.x could still have several disk chunks after an optimize operation. After upgrading to 3.x, these tables can now be optimized to 1-disk chunk with the usual OPTIMIZE command. Index files also changed. The only component that didn't get any structural changes is the .spp file (hitlists). .sps (strings/json) and .spm (MVA) are now held by .spb (var-length attributes). The new format has an .spm file present, but it's used for row map (previously it was dedicated for MVA attributes). The new extensions added are .spt (docid lookup), .sphi ( secondary index histograms), .spds (document storage). In case you are using scripts that manipulate table files, they should be adapted for the new file extensions.

The upgrade procedure may differ depending on your setup (number of servers in the cluster, whether you have high availability or not, etc.), but in general, it involves creating new 3.x table versions and replacing your existing ones, as well as replacing older 2.x binaries with the new ones.

There are two special requirements to take care:

Manticore Search 3 includes a new tool - index_converter - that can convert Sphinx 2.x / Manticore 2.x tables to 3.x format. index_converter comes in a separate package which should be installed first. Using the convert tool create 3.x versions of your tables. index_converter can write the new files in the existing data folder and backup the old files or it can write the new files to a chosen folder.

Basic upgrade instruction

If you have a single server:

To minimize downtime, you can copy 2.x tables, config (you'll need to edit paths here for tables, logs, and different ports), and binaries to a separate location and start this on a separate port. Point your application to it. After upgrading to 3.0 and the new server is started, you can point the application back to the normal ports. If everything is good, stop the 2.x copy and delete the files to free up space.

If you have a spare box (like a testing or staging server), you can do the table upgrade there first and even install Manticore 3 to perform several tests. If everything is okay, copy the new table files to the production server. If you have multiple servers that can be pulled out of production, do it one by one and perform the upgrade on each. For distributed setups, 2.x searchd can work as a master with 3.x nodes, so you can do the upgrading on the data nodes first, and then on the master node.

There have been no changes made to the way clients should connect to the engine, or any changes to the querying mode or behavior of queries.

kill-lists in Sphinx / Manticore 2.x vs Manticore 3.x

Kill-lists have been redesigned in Manticore Search 3. In previous versions, kill-lists were applied to the result set provided by each previously searched table at query time.

Thus, in 2.x, the table order at query time mattered. For example, if a delta table had a kill-list, in order to apply it against the main table, the order had to be main, delta (either in a distributed table or in the FROM clause).

In Manticore 3, kill-lists are applied to a table when it's loaded during searchd startup or gets rotated. The new directive killlist_target in table configuration specifies target tables and defines which doc ids from the source table should be used for suppression. These can be ids from the defined kill-list, actual doc ids of the table or both.

Documents from the kill-lists are deleted from the target tables, they are not returned in results even if the search doesn't include the table that provided the kill-lists. Because of that, the order of tables for searching does not matter anymore. Now, delta, main and main, delta will provide the same results.

In previous versions, tables were rotated following the order from the configuration file. In Manticore 3 table rotation order is much smarter and works in accordance with killlist targets. Before starting to rotate tables, the server looks for chains of tables by killlist_target definitions. It will then first rotate tables not referenced anywhere as kill-lists targets. Next, it will rotate tables targeted by already rotated tables and so on. For example, if we do indexer --all and we have 3 tables: main, delta_big (which targets at the main) and delta_small (with target at delta_big), first, delta_small is rotated, then delta_big and finally the main. This is to ensure that when a dependent table is rotated it gets the most actual kill-list from other tables.

Configuration keys removed in Manticore 3.x

Updating var-length attributes in Manticore 3.x

String, JSON and MVA attributes can be updated in Manticore 3.x using UPDATE statement.

In 2.x string attributes required REPLACE, for JSON it was only possible to update scalar properties (as they were fixed-width) and MVAs could be updated using the MVA pool. Now updates are performed directly on the blob component. One setting that may require tuning is attr_update_reserve which allows changing the allocated extra space at the end of the blob used to avoid frequent resizes in case the new values are bigger than the existing values in the blob.

Document IDs in Manticore 3.x

Doc ids used to be UNSIGNED 64-bit integers. Now they are POSITIVE SIGNED 64-bit integers.

RT mode in Manticore 3.x

Read here about the RT mode

Special suffixes since Manticore 3.x

Manticore 3.x recognizes and parses special suffixes which makes easier to use numeric values with special meaning. Common form for them is integer number + literal, like 10k or 100d, but not 40.3s (since 40.3 is not integer), or not 2d 4h (since there are two, not one value). Literals are case-insensitive, so 10W is the same as 10w. There are 2 types of such suffixes currently supported:

index_converter

index_converter is a tool for converting tables created with Sphinx/Manticore Search 2.x to the Manticore Search 3.x table format. The tool can be used in several different ways:

Convert one table at a time

$ index_converter --config /home/myuser/manticore.conf --index tablename

Convert all tables

$ index_converter --config /home/myuser/manticore.conf --all

Convert tables found in a folder

$ index_converter  --path /var/lib/manticoresearch/data --all

The new version of the table is written by default in the same folder. The previous version's files are saved with the .old extension in their name. An exception is the .spp (hitlists) file, which is the only table component that didn't have any changes in the new format.

You can save the new table version to a different folder using the -–output-dir option

$ index_converter --config /home/myuser/manticore.conf --all --output-dir /new/path

Convert kill lists

A special case is for tables containing kill-lists. As the behaviour of how kill-lists works has changed (see killlist_target), the delta table should know which are the target tables for applying the kill-lists. There are 3 ways to have a converted table ready for setting targeted tables for applying kill-lists:

Complete list of index_converter options

Here's the complete list of index_converter options:

¶ 4

Quick start guide

Install and start Manticore

You can install and start Manticore easily on various operating systems, including Ubuntu, Centos, Debian, Windows, and MacOS. Additionally, you can also use Manticore as a Docker container.

Ubuntu

Ubuntu
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
sudo dpkg -i manticore-repo.noarch.deb
sudo apt update
sudo apt install manticore manticore-columnar-lib
sudo systemctl start manticore
### Debian
Debian
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
sudo dpkg -i manticore-repo.noarch.deb
sudo apt update
sudo apt install manticore manticore-columnar-lib
sudo systemctl start manticore
### Centos
Centos
sudo yum install https://repo.manticoresearch.com/manticore-repo.noarch.rpm
sudo yum install manticore manticore-columnar-lib
sudo systemctl start manticore
### Windows
Windows
* Download the Windows archive from https://manticoresearch.com/install/ * Extract all files from the archive to `C:\Manticore` * Run the following command to install Manticore as a service: * ```bash C:\Manticore\bin\searchd --install --config C:\Manticore\sphinx.conf.in --servicename Manticore ``` * Start Manticore from the Services snap-in of the Microsoft Management Console. ### MacOS
MacOS
brew install manticoresearch
brew services start manticoresearch
### Docker
Docker
docker pull manticoresearch/manticore
docker run -e EXTRA=1 --name manticore -p9306:9306 -p9308:9308 -p9312:9312 -d manticoresearch/manticore
For persisting your data directory, read [how to use Manticore docker in production](#Production-use)

Connect to Manticore

By default Manticore is waiting for your connections on:

More details about HTTPS support can be found in our learning course here.

SQL
mysql -h0 -P9306
HTTP
HTTP is a stateless protocol, so it doesn't require any special connection phase. You can simply send an HTTP request to the server and receive the response. To communicate with Manticore using the JSON interface, you can use any HTTP client library in your programming language of choice to send GET or POST requests to the server and parse the JSON responses:
curl -s "http://localhost:9308/search"
PHP
// https://github.com/manticoresoftware/manticoresearch-php
require_once __DIR__ . '/vendor/autoload.php';
$config = ['host'=>'127.0.0.1','port'=>9308];
$client = new \Manticoresearch\Client($config);
Python
// https://github.com/manticoresoftware/manticoresearch-python
import manticoresearch
config = manticoresearch.Configuration(
    host = "http://127.0.0.1:9308"
)
client = manticoresearch.ApiClient(config)
indexApi = manticoresearch.IndexApi(client)
searchApi = manticoresearch.SearchApi(client)
utilsApi = manticoresearch.UtilsApi(client)
Javascript
// https://github.com/manticoresoftware/manticoresearch-javascript
var Manticoresearch = require('manticoresearch');
var client= new Manticoresearch.ApiClient()
client.basePath="http://127.0.0.1:9308";
indexApi = new Manticoresearch.IndexApi(client);
searchApi = new Manticoresearch.SearchApi(client);
utilsApi = new Manticoresearch.UtilsApi(client);
Java
// https://github.com/manticoresoftware/manticoresearch-java
import com.manticoresearch.client.*;
import com.manticoresearch.client.model.*;
import com.manticoresearch.client.api.*;
...
ApiClient client = Configuration.getDefaultApiClient();
client.setBasePath("http://127.0.0.1:9308");
...
IndexApi indexApi = new IndexApi(client);
SearchApi searchApi = new UtilsApi(client);
UtilsApi utilsApi = new UtilsApi(client);
C#
// https://github.com/manticoresoftware/manticoresearch-net
using System.Net.Http;
...
using ManticoreSearch.Client;
using ManticoreSearch.Api;
using ManticoreSearch.Model;
...
config = new Configuration();
config.BasePath = "http://localhost:9308";
httpClient = new HttpClient();
httpClientHandler = new HttpClientHandler();
...
var indexApi = new IndexApi(httpClient, config, httpClientHandler);
var searchApi = new SearchApi(httpClient, config, httpClientHandler);
var utilsApi = new UtilsApi(httpClient, config, httpClientHandler);
Typescript
import {
  Configuration,
  IndexApi,
  SearchApi,
  UtilsApi
} from "manticoresearch-ts";
...
const config = new Configuration({
  basePath: 'http://localhost:9308',
})
const indexApi = new IndexApi(config);
const searchApi = new SearchApi(config);
const utilsApi = new UtilsApi(config);
Go
import (
    "context"
    manticoreclient "github.com/manticoresoftware/manticoresearch-go"
)
...
configuration := manticoreclient.NewConfiguration()
configuration.Servers[0].URL = "http://localhost:9308"
apiClient := manticoreclient.NewAPIClient(configuration)

Create a table

Let's now create a table called "products" with 2 fields:

Note that it is possible to omit creating a table with an explicit create statement. For more information, see Auto schema.

More information about different ways to create a table can be found in our learning courses:

SQL
create table products(title text, price float) morphology='stem_en';
Query OK, 0 rows affected (0.02 sec)
HTTP
POST /cli -d "create table products(title text, price float) morphology='stem_en'"
{
"total":0,
"error":"",
"warning":""
}
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'float'],
],['morphology' => 'stem_en']);
Python
utilsApi.sql('create table products(title text, price float) morphology=\'stem_en\'')
Javascript
res = await utilsApi.sql('create table products(title text, price float) morphology=\'stem_en\'');
Java
utilsApi.sql("create table products(title text, price float) morphology='stem_en'");
C#
utilsApi.Sql("create table products(title text, price float) morphology='stem_en'");
TypeScript
res = await utilsApi.sql('create table products(title text, price float) morphology=\'stem_en\'');
Go
res := apiClient.UtilsAPI.Sql(context.Background()).Body("create table products(title text, price float) morphology='stem_en'").Execute();

Add documents

Let's now add few documents to the table:

SQL
insert into products(title,price) values ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);
Query OK, 3 rows affected (0.01 sec)
JSON
`"id":0` or no id forces automatic ID generation.
POST /insert
{
  "index":"products",
  "doc":
  {
    "title" : "Crossbody Bag with Tassel",
    "price" : 19.85
  }
}


POST /insert
{
  "index":"products",
  "doc":
  {
    "title" : "microfiber sheet set",
    "price" : 19.99
  }
}

POST /insert
{
  "index":"products",
  "doc":
  {
    "title" : "Pet Hair Remover Glove",
    "price" : 7.99
  }
}
{
  "_index": "products",
  "_id": 0,
  "created": true,
  "result": "created",
  "status": 201
}

{
  "_index": "products",
  "_id": 0,
  "created": true,
  "result": "created",
  "status": 201
}

{
  "_index": "products",
  "_id": 0,
  "created": true,
  "result": "created",
  "status": 201
}
PHP
$index->addDocuments([
        ['title' => 'Crossbody Bag with Tassel', 'price' => 19.85],
        ['title' => 'microfiber sheet set', 'price' => 19.99],
        ['title' => 'Pet Hair Remover Glove', 'price' => 7.99]
]);
Python
indexApi.insert({"index" : "products", "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
indexApi.insert({"index" : "products", "doc" : {"title" : "microfiber sheet set", "price" : 19.99}})
indexApi.insert({"index" : "products", "doc" : {"title" : "Pet Hair Remover Glove", "price" : 7.99}})
Javascript
res = await indexApi.insert({"index" : "products", "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}});
res = await indexApi.insert({"index" : "products", "doc" : {"title" : "microfiber sheet set", "price" : 19.99}});
res = await indexApi.insert({"index" : "products", "doc" : {"title" : "Pet Hair Remover Glove", "price" : 7.99}});
Java
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
    put("price",19.85);
}};
newdoc.index("products").setDoc(doc);
sqlresult = indexApi.insert(newdoc);

newdoc = new InsertDocumentRequest();
doc = new HashMap<String,Object>(){{
    put("title","microfiber sheet set");
    put("price",19.99);
}};
newdoc.index("products").setDoc(doc);
sqlresult = indexApi.insert(newdoc);

newdoc = new InsertDocumentRequest();
doc = new HashMap<String,Object>(){{
    put("title","Pet Hair Remover Glove");
    put("price",7.99);
 }};
newdoc.index("products").setDoc(doc);
indexApi.insert(newdoc);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title","Crossbody Bag with Tassel");
doc.Add("price",19.85);
InsertDocumentRequest insertDocumentRequest = new InsertDocumentRequest(index: "products", doc: doc);
sqlresult = indexApi.Insert(insertDocumentRequest);

doc = new Dictionary<string, Object>();
doc.Add("title","microfiber sheet set");
doc.Add("price",19.99);
insertDocumentRequest = new InsertDocumentRequest(index: "products", doc: doc);
sqlresult = indexApi.Insert(insertDocumentRequest);

doc = new Dictionary<string, Object>();
doc.Add("title","Pet Hair Remover Glove");
doc.Add("price",7.99);
insertDocumentRequest = new InsertDocumentRequest(index: "products", doc: doc);
sqlresult = indexApi.Insert(insertDocumentRequest);
TypeScript
res = await indexApi.insert({
  index: 'test',
  id: 1,
  doc: { content: 'Text 1', name: 'Doc 1', cat: 1 },
});
res = await indexApi.insert({
  index: 'test',
  id: 2,
  doc: { content: 'Text 2', name: 'Doc 2', cat: 2 },
});
res = await indexApi.insert({
  index: 'test',
  id: 3,
  doc: { content: 'Text 3', name: 'Doc 3', cat: 7 },
});
Go
indexDoc := map[string]interface{} {"content": "Text 1", "name": "Doc 1", "cat": 1 }
indexReq := manticoreclient.NewInsertDocumentRequest("products", indexDoc)
indexReq.SetId(1)
apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*indexReq).Execute()

indexDoc = map[string]interface{} {"content": "Text 2", "name": "Doc 3", "cat": 2 }
indexReq = manticoreclient.NewInsertDocumentRequest("products", indexDoc)
indexReq.SetId(2)
apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*indexReq).Execute()

indexDoc = map[string]interface{} {"content": "Text 3", "name": "Doc 3", "cat": 7 }     
indexReq = manticoreclient.NewInsertDocumentRequest("products", indexDoc)
indexReq.SetId(3)
apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*indexReq).Execute()

More details on the subject can be found here:

Let's find one of the documents. The query we will use is 'remove hair'. As you can see, it finds a document with the title 'Pet Hair Remover Glove' and highlights 'Hair remover' in it, even though the query has "remove", not "remover". This is because when we created the table, we turned on using English stemming (morphology "stem_en").

SQL
select id, highlight(), price from products where match('remove hair');
+---------------------+-------------------------------+----------+
| id                  | highlight()                   | price    |
+---------------------+-------------------------------+----------+
| 1513686608316989452 | Pet <b>Hair Remover</b> Glove | 7.990000 |
+---------------------+-------------------------------+----------+
1 row in set (0.00 sec)
JSON
POST /search
{
  "index": "products",
  "query": { "match": { "title": "remove hair" } },
  "highlight":
  {
    "fields": ["title"]
  }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "1513686608316989452",
        "_score": 1680,
        "_source": {
          "price": 7.99,
          "title": "Pet Hair Remover Glove"
        },
        "highlight": {
          "title": [
            "Pet <b>Hair Remover</b> Glove"
          ]
        }
      }
    ]
  }
}
PHP
$result = $index->search('@title remove hair')->highlight(['title'])->get();
foreach($result as $doc)
{
    echo "Doc ID: ".$doc->getId()."\n";
    echo "Doc Score: ".$doc->getScore()."\n";
    echo "Document fields:\n";
    print_r($doc->getData());
    echo "Highlights: \n";
    print_r($doc->getHighlight());
}
Doc ID: 1513686608316989452
Doc Score: 1680
Document fields:
Array
(
    [price] => 7.99
    [title] => Pet Hair Remover Glove
)
Highlights:
Array
(
    [title] => Array
        (
            [0] => Pet <b>Hair Remover</b> Glove
        )
)
` Python
Python
searchApi.search({"index":"products","query":{"query_string":"@title remove hair"},"highlight":{"fields":["title"]}})
{'hits': {'hits': [{u'_id': u'1513686608316989452',
                    u'_score': 1680,
                    u'_source': {u'title': u'Pet Hair Remover Glove', u'price':7.99},
                    u'highlight':{u'title':[u'Pet <b>Hair Remover</b> Glove']}}}],
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.search({"index":"products","query":{"query_string":"@title remove hair"}"highlight":{"fields":["title"]}});
{"hits": {"hits": [{"_id": "1513686608316989452",
                    "_score": 1680,
                    "_source": {"title": "Pet Hair Remover Glove", "price":7.99},
                    "highlight":{"title":["Pet <b>Hair Remover</b> Glove"]}}],
          "total": 1},
 "profile": None,
 "timed_out": False,
 "took": 0}
java
Java
query = new HashMap<String,Object>();
query.put("query_string","@title remove hair");
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
HashMap<String,Object> highlight = new HashMap<String,Object>(){{
    put("fields",new String[] {"title"});

}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 84
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: null
        hits: [{_id=1513686608316989452, _score=1, _source={price=7.99, title=Pet Hair Remover Glove}, highlight={title=[Pet <b>Hair Remover</b> Glove]}}]
        aggregations: null
    }
    profile: null
}
C#
C#
object query =  new { query_string="@title remove hair" };
var searchRequest = new SearchRequest("products", query);
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"title"};
searchRequest.Highlight = highlight;
searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 103
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: null
        hits: [{_id=1513686608316989452, _score=1, _source={price=7.99, title=Pet Hair Remover Glove}, highlight={title=[Pet <b>Hair Remover</b> Glove]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
TypeScript
res = await searchApi.search({
  index: 'test',
  query: { query_string: {'text 1'} },
  highlight: {'fields': ['content'] }
});
{
    "hits": 
    {
        "hits": 
        [{
            "_id": "1",
            "_score": 1400,
            "_source": {"content":"Text 1","name":"Doc 1","cat":1},
            "highlight": {"content":["<b>Text 1</b>"]}
        }],
        "total": 1
    },
    "profile": None,
    "timed_out": False,
    "took": 0
}
Go
Go
searchRequest := manticoreclient.NewSearchRequest("test")
query := map[string]interface{} {"query_string": "text 1"};
searchRequest.SetQuery(query);

highlightField := manticoreclient.NewHighlightField("content")
fields := []interface{}{ highlightField }
highlight := manticoreclient.NewHighlight()
highlight.SetFields(fields)
searchRequest.SetHighlight(highlight);

res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "hits": 
    {
        "hits": 
        [{
            "_id": "1",
            "_score": 1400,
            "_source": {"content":"Text 1","name":"Doc 1","cat":1},
            "highlight": {"content":["<b>Text 1</b>"]}
        }],
        "total": 1
    },
    "profile": None,
    "timed_out": False,
    "took": 0
}

More information on different search options available in Manticore can be found in our learning courses:

Update

Let's assume we now want to update the document - change the price to 18.5. This can be done by filtering by any field, but normally you know the document id and update something based on that.

SQL
update products set price=18.5 where id = 1513686608316989452;
Query OK, 1 row affected (0.00 sec)
JSON
POST /update
{
  "index": "products",
  "id": 1513686608316989452,
  "doc":
  {
    "price": 18.5
  }
}
{
  "_index": "products",
  "_id": 1513686608316989452,
  "result": "updated"
}
PHP
$doc = [
    'body' => [
        'index' => 'products',
        'id' => 2,
        'doc' => [
            'price' => 18.5
        ]
    ]
];

$response = $client->update($doc);
Python
indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1513686608316989452, "doc" : {"price":18.5}})
javascript
res = await indexApi.update({"index" : "products", "id" : 1513686608316989452, "doc" : {"price":18.5}});
Java
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("price",18.5);
}};
updateRequest.index("products").id(1513686608316989452L).setDoc(doc);
indexApi.update(updateRequest);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("price", 18.5);
UpdateDocumentRequest updateDocumentRequest = new UpdateDocumentRequest(index: "products", id: 1513686608316989452L, doc: doc);
indexApi.Update(updateDocumentRequest);
TypeScript
res = await indexApi.update({ index: "test", id: 1, doc: { cat: 10 } });
Go
updDoc = map[string]interface{} {"cat": 10}
updRequest = manticoreclient.NewUpdateDocumentRequest("test", updDoc)
updRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updRequest).Execute()

Delete

Let's now delete all documents with price lower than 10.

SQL
delete from products where price < 10;
Query OK, 1 row affected (0.00 sec)
JSON
POST /delete
{
  "index": "products",
  "query":
  {
    "range":
    {
      "price":
      {
        "lte": 10
      }
    }
  }
}
{
  "_index": "products",
  "deleted": 1
}
PHP
$result = $index->deleteDocuments(new \Manticoresearch\Query\Range('price',['lte'=>10]));
Array
(
    [_index] => products
    [deleted] => 1
)
Python
indexApi.delete({"index" : "products", "query": {"range":{"price":{"lte":10}}}})
javascript
res = await indexApi.delete({"index" : "products", "query": {"range":{"price":{"lte":10}}}});
Java
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
query = new HashMap<String,Object>();
query.put("range",new HashMap<String,Object>(){{
    put("price",new HashMap<String,Object>(){{
        put("lte",10);
    }});
}});
deleteRequest.index("products").setQuery(query);
indexApi.delete(deleteRequest);
C#
Dictionary<string, Object> price = new Dictionary<string, Object>();
price.Add("lte", 10);
Dictionary<string, Object> range = new Dictionary<string, Object>();
range.Add("price", price);
DeleteDocumentRequest deleteDocumentRequest = new DeleteDocumentRequest(index: "products", range: range);
indexApi.Delete(deleteDocumentRequest);
TypeScript
res = await indexApi.delete({
  index: 'test',
  query: { match: { '*': 'Text 1' } },
});
Go
delRequest := manticoreclient.NewDeleteDocumentRequest("test")
matchExpr := map[string]interface{} {"*": "Text 1t"}
delQuery := map[string]interface{} {"match": matchExpr }
delRequest.SetQuery(delQuery)
res, _, _ := apiClient.IndexAPI.Delete(context.Background()).DeleteDocumentRequest(*delRequest).Execute();
¶ 5

Starting the server

Manticore Search server can be started using different methods, depending on the installation type.

¶ 5.1

Starting Manticore in Linux

Starting and stopping using systemd

After the installation the Manticore Search service is not started automatically. To start Manticore run the following command:

sudo systemctl start manticore

To stop Manticore run the following command:

sudo systemctl stop manticore

The Manticore service is set to run at boot. You can check it by running:

sudo systemctl is-enabled manticore

If you want to disable Manticore from starting at boot time, run:

sudo systemctl disable manticore

To make Manticore start at boot, run:

sudo systemctl enable manticore

searchd process logs startup information in systemd journal. If systemd logging is enabled you can view the logged information with the following command:

sudo journalctl -u manticore

Custom startup flags using systemd

systemctl set-environment _ADDITIONAL_SEARCHD_PARAMS allows you to specify custom startup flags that the Manticore Search daemon should be started with. See full list here.

For example, to start Manticore with the debug logging level, you can run:

systemctl set-environment _ADDITIONAL_SEARCHD_PARAMS='--logdebug'
systemctl restart manticore

To undo it, run:

systemctl set-environment _ADDITIONAL_SEARCHD_PARAMS=''
systemctl restart manticore

Note, systemd environment variables get reset on server reboot.

Starting and stopping using service

Manticore can be started and stopped using service commands:

sudo service manticore start
sudo service manticore stop

To enable the sysV service at boot on RedHat systems run:

chkconfig manticore on

To enable the sysV service at boot on Debian systems (including Ubuntu) run:

update-rc.d manticore defaults

Please note that searchd is started by the init system under the manticore user and all files created by the server will be owned by this user. If searchd is started under, for example, the root user, the file permissions will be changed, which may result in issues when running searchd as a service again.

¶ 5.2

Starting Manticore manually

You can also start Manticore Search by calling searchd (Manticore Search server binary) directly:

searchd [OPTIONS]

Note that without specifying a path to the configuration file, searchd will try to find it in several locations depending on the operating system.

searchd command line options

The options available to searchd in all operating systems are:

bash $ searchd --config /etc/manticoresearch/manticore.conf --stop

bash $ searchd --config /etc/manticoresearch/manticore.conf --stopwait

Possible exit codes are as follows:

bash $ searchd --status $ searchd --config /etc/manticoresearch/manticore.conf --status

bash $ searchd --console --pidfile

bash $ searchd --config /etc/manticoresearch/manticore.conf --console

bash $ searchd --config /etc/manticoresearch/manticore.conf --iostats

bash $ searchd --config /etc/manticoresearch/manticore.conf --cpustats

An example of usage:

bash $ searchd --port 9313

bash $ searchd --config /etc/manticoresearch/manticore.conf --coredump

Windows options

There are some options for searchd that are specific to Windows platforms, concerning handling as a service, and are only available in Windows binaries.

Note that in Windows searchd will default to --console mode, unless you install it as a service.

bat C:\WINDOWS\system32> C:\Manticore\bin\searchd.exe --install --config C:\Manticore\manticore.conf

If you want to have the I/O stats every time you start searchd, you need to specify the option on the same line as the --install command thus:

bat C:\WINDOWS\system32> C:\Manticore\bin\searchd.exe --install --config C:\Manticore\manticore.conf --iostats

bat C:\WINDOWS\system32> C:\Manticore\bin\searchd.exe --delete

Plugin dir

Manticore utilizes the plugin_dir for storing and using Manticore Buddy plugins. By default, this value is accessible to the "manticore" user in a standard installation. However, if you start the searchd daemon manually with a different user, the daemon might not have access to the plugin_dir. To address this problem, ensure you specify a plugin_dir in the common section that the user running the searchd daemon can write to.

Signals

searchd supports a number of signals:

Environment variables

¶ 5.3

Starting and using Manticore in Docker

The image is based on current release of Manticore package.

The default configuration includes a sample Real-Time table and listens on the default ports:

The image comes with libraries for easy indexing data from MySQL, PostgreSQL, XML and CSV files.

How to run Manticore Search Docker image

Quick usage

The below is the simplest way to start Manticore in a container and log in to it via the mysql client:

docker run -e EXTRA=1 --name manticore --rm -d manticoresearch/manticore && echo "Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time" && until docker logs manticore 2>&1 | grep -q "accepting connections"; do sleep 1; echo -n .; done && echo && docker exec -it manticore mysql && docker stop manticore

Note that upon exiting the MySQL client, the Manticore container will be stopped and removed, resulting in no saved data. For information on using Manticore in a production environment, please see below.

The image comes with a sample table that can be loaded like this:

mysql> source /sandbox.sql

Also, the mysql client has several sample queries in its history that you can run on the above table, just use Up/Down keys in the client to see and run them.

Production use

Ports and mounting points

For data persistence the folder /var/lib/manticore/ should be mounted to local storage or other desired storage engine.

The configuration file inside the instance is located at /etc/manticoresearch/manticore.conf. For custom settings, this file should be mounted to your own configuration file.

The ports are 9306/9308/9312 for SQL/HTTP/Binary, expose them depending on how you are going to use Manticore. For example:

docker run -e EXTRA=1 --name manticore -v $(pwd)/data:/var/lib/manticore -p 127.0.0.1:9306:9306 -p 127.0.0.1:9308:9308 -d manticoresearch/manticore

or

docker run -e EXTRA=1 --name manticore -v $(pwd)/manticore.conf:/etc/manticoresearch/manticore.conf -v $(pwd)/data:/var/lib/manticore/ -p 127.0.0.1:9306:9306 -p 127.0.0.1:9308:9308 -d manticoresearch/manticore

Make sure to remove 127.0.0.1: if you want the ports to be available for external hosts.

Manticore Columnar Library and Manticore Buddy

The Manticore Search Docker image doesn't come with the Manticore Columnar Library pre-installed, which is necessary if you require columnar storage and secondary indexes. However, it can easily be enabled during runtime by setting the environment variable EXTRA=1. For example, docker run -e EXTRA=1 ... manticoresearch/manticore. This will download and install the library in the data directory (which is typically mapped as a volume in production environments) and it won't be re-downloaded unless the Manticore Search version is changed.

Using EXTRA=1 also activates Manticore Buddy, which is used for processing certain commands. For more information, refer to the changelog.

If you only need the MCL, you can use the environment variable MCL=1.

Docker-compose

In many cases, you may want to use Manticore in conjunction with other images specified in a Docker Compose YAML file. Below is the minimal recommended configuration for Manticore Search in a docker-compose.yml file:

version: '2.2'

services:
  manticore:
    container_name: manticore
    image: manticoresearch/manticore
    environment:
      - EXTRA=1
    restart: always
    ports:
      - 127.0.0.1:9306:9306
      - 127.0.0.1:9308:9308
    ulimits:
      nproc: 65535
      nofile:
         soft: 65535
         hard: 65535
      memlock:
        soft: -1
        hard: -1
    volumes:
      - ./data:/var/lib/manticore
#      - ./manticore.conf:/etc/manticoresearch/manticore.conf # uncomment if you use a custom config

Besides using the exposed ports 9306 and 9308, you can log into the instance by running docker-compose exec manticore mysql.

HTTP protocol

HTTP protocol is exposed on port 9308. You can map the port locally and connect using curl.:

docker run -e EXTRA=1 --name manticore -p 9308:9308 -d manticoresearch/manticore

Create a table:

JSON
POST /cli -d 'CREATE TABLE testrt ( title text, content text, gid integer)'

Insert a document:

JSON
POST /insert
-d'{"index":"testrt","id":1,"doc":{"title":"Hello","content":"world","gid":1}}'

Perform a simple search:

JSON
POST /search -d '{"index":"testrt","query":{"match":{"*":"hello world"}}}'

Logging

By default, the server is set to send its logging to /dev/stdout, which can be viewed from the host with:

docker logs manticore

The query log can be diverted to Docker log by passing the variable QUERY_LOG_TO_STDOUT=true.

Multi-node cluster with replication

Here is a simple docker-compose.yml for defining a two node cluster:

version: '2.2'

services:

  manticore-1:
    image: manticoresearch/manticore
    environment:
      - EXTRA=1
    restart: always
    ulimits:
      nproc: 65535
      nofile:
         soft: 65535
         hard: 65535
      memlock:
        soft: -1
        hard: -1
    networks:
      - manticore
  manticore-2:
    image: manticoresearch/manticore
    environment:
      - EXTRA=1
    restart: always
    ulimits:
      nproc: 65535
      nofile:
        soft: 65535
        hard: 65535
      memlock:
        soft: -1
        hard: -1
    networks:
      - manticore
networks:
  manticore:
    driver: bridge

mysql> CREATE TABLE testrt ( title text, content text, gid integer);

mysql> CREATE CLUSTER posts;
Query OK, 0 rows affected (0.24 sec)

mysql> ALTER CLUSTER posts ADD testrt;
Query OK, 0 rows affected (0.07 sec)

MySQL [(none)]> exit
Bye
```

mysql> JOIN CLUSTER posts AT 'manticore-1:9312';
mysql> INSERT INTO posts:testrt(title,content,gid) VALUES('hello','world',1);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> exit
Bye
```

MySQL [(none)]> select * from testrt;
+---------------------+------+-------+---------+
| id | gid | title | content |
+---------------------+------+-------+---------+
| 3891565839006040065 | 1 | hello | world |
+---------------------+------+-------+---------+
1 row in set (0.00 sec)

MySQL [(none)]> exit
Bye
```

Memory locking and limits

It's recommended to overwrite the default ulimits of docker for the Manticore instance:

 --ulimit nofile=65536:65536

For best performance, table components can be "mlocked" into memory. When Manticore is run under Docker, the instance requires additional privileges to allow memory locking. The following options must be added when running the instance:

  --cap-add=IPC_LOCK --ulimit memlock=-1:-1

Configuring Manticore Search with Docker

If you want to run Manticore with a custom configuration that includes table definitions, you will need to mount the configuration to the instance:

docker run -e EXTRA=1 --name manticore -v $(pwd)/manticore.conf:/etc/manticoresearch/manticore.conf -v $(pwd)/data/:/var/lib/manticore -p 127.0.0.1:9306:9306 -d manticoresearch/manticore

Take into account that Manticore search inside the container is run under user manticore. Performing operations with table files (like creating or rotating plain tables) should be also done under manticore. Otherwise the files will be created under root and the search daemon won't have rights to open them. For example here is how you can rotate all tables:

docker exec -it manticore gosu manticore indexer --all --rotate

You can also set individual searchd and common configuration settings using Docker environment variables.

The settings must be prefixed with their section name, example for in case of mysql_version_string the variable must be named searchd_mysql_version_string:

docker run -e EXTRA=1 --name manticore  -p 127.0.0.1:9306:9306  -e searchd_mysql_version_string='5.5.0' -d manticoresearch/manticore

In case of the listen directive, new listening interfaces using the Docker variable searchd_listen in addition to the default ones. Multiple interfaces can be declared, separated by a semi-colon ("|"). To listen only on a network address, the $ip (retrieved internally from hostname -i) can be used as address alias.

For example -e searchd_listen='9316:http|9307:mysql|$ip:5443:mysql_vip' will add an additional SQL interface on port 9307, an SQL VIP listener on port 5443 running only on the instance's IP, and an HTTP listener on port 9316, in addition to the defaults on 9306 and 9308, respectively.

$ docker run -e EXTRA=1 --rm -p 1188:9307  -e searchd_mysql_version_string='5.5.0' -e searchd_listen='9316:http|9307:mysql|$ip:5443:mysql_vip'  manticore
[Mon Aug 17 07:31:58.719 2020] [1] using config file '/etc/manticoresearch/manticore.conf' (9130 chars)...
listening on all interfaces for http, port=9316
listening on all interfaces for mysql, port=9307
listening on 172.17.0.17:5443 for VIP mysql
listening on all interfaces for mysql, port=9306
listening on UNIX socket /var/run/mysqld/mysqld.sock
listening on 172.17.0.17:9312 for sphinx
listening on all interfaces for http, port=9308
prereading 0 indexes
prereaded 0 indexes in 0.000 sec
accepting connections

Startup flags

To start Manticore with custom startup flags, specify them as arguments when using docker run. Ensure you do not include the searchd command and include the --nodetach flag. Here's an example:

docker run -e EXTRA=1 --name manticore --rm manticoresearch/manticore:latest --replay-flags=ignore-trx-errors --nodetach

Running under non-root

By default, the main Manticore process searchd is running under user manticore inside the container, but the script which runs on starting the container is run under your default docker user which in most cases is root. If that's not what you want you can use docker ... --user manticore or user: manticore in docker compose yaml to make everything run under manticore. Read below about possible volume permissions issue you can get and how to solve it.

Creating plain tables on startup

To build plain tables specified in your custom configuration file, you can use the CREATE_PLAIN_TABLES=1 environment variable. It will execute indexer --all before Manticore starts. This is useful if you don't use volumes, and your tables are easy to recreate.

docker run -e CREATE_PLAIN_TABLES=1 --name manticore -v $(pwd)/manticore.conf:/etc/manticoresearch/manticore.conf -p 9306:9306 -p 9308:9308 -d manticoresearch/manticore

Troubleshooting

Permissions issue with a mounted volume

In case you are running Manticore Search docker under non-root (using docker ... --user manticore or user: manticore in docker compose yaml), you can face a permissions issue, for example:

FATAL: directory /var/lib/manticore write error: failed to open /var/lib/manticore/tmp: Permission denied

or in case you are using -e EXTRA=1:

mkdir: cannot create directory ‘/var/lib/manticore/.mcl/’: Permission denied

This can happen because the user which is used to run processes inside the container may have no permissions to modify the directory you have mounted to the container. To fix it you can chown or chmod the mounted directory. If you run the container under user manticore you need to do:

chown -R 999:999 data

since user manticore has ID 999 inside the container.

¶ 5.4

Starting Manticore in Windows

On Windows, if you want Manticore to start at boot, you can install it as a Windows Service. You can follow the instructions in the Manticore as Windows Service guide to install Manticore as a service.

Once Manticore is installed as a service, you can start and stop it from the Control Panel or from the command line using the sc.exe command.

sc.exe start Manticore
sc.exe stop Manticore

Alternatively, if you don't install Manticore as a Windows service, you can start it from the command line by running the following command:

.\bin\searchd -c manticore.conf

This command assumes that you have the Manticore's binary and the configuration file in the current directory.

¶ 5.5

Starting Manticore in MacOS

Starting Manticore via HomeBrew package manager

If Manticore is installed using HomeBrew, you can run it as a Brew service.

To start Manticore, run the following command:

brew services start manticoresearch

To stop Manticore, run the following command:

brew services stop manticoresearch
¶ 6

Creating a table

¶ 6.1

Data types

Full-text fields and attributes

Manticore's data types can be split into two categories: full-text fields and attributes.

Full-text fields

Full-text fields:

Full-text fields are represented by the data type text. All other data types are called "attributes".

Attributes

Attributes are non-full-text values associated with each document that can be used to perform non-full-text filtering, sorting and grouping during a search.

It is often desired to process full-text search results based not only on matching document ID and its rank, but also on a number of other per-document values. For example, one might need to sort news search results by date and then relevance, or search through products within a specified price range, or limit a blog search to posts made by selected users, or group results by month. To do this efficiently, Manticore enables not only full-text fields, but also additional attributes to be added to each document. These attributes can be used to filter, sort, or group full-text matches, or to search only by attributes.

The attributes, unlike full-text fields, are not full-text indexed. They are stored in the table, but it is not possible to search them as full-text.

A good example for attributes would be a forum posts table. Assume that only the title and content fields need to be full-text searchable - but that sometimes it is also required to limit search to a certain author or a sub-forum (i.e., search only those rows that have some specific values of author_id or forum_id); or to sort matches by post_date column; or to group matching posts by month of the post_date and calculate per-group match counts.

SQL
CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp);
JSON
POST /cli -d "CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('forum');
$index->create([
    'title'=>['type'=>'text'],
    'content'=>['type'=>'text'],
    'author_id'=>['type'=>'int'],
    'forum_id'=>['type'=>'int'],
    'post_date'=>['type'=>'timestamp']
]);
Python
utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)')
Javascript
res = await utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)');
Java
utilsApi.sql("CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)");
C#
utilsApi.Sql("CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)");
config
table forum
{
    type = rt
    path = forum

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title
    rt_field = content

    # this option needs to be specified for the field to be stored
    stored_fields = title, content

    rt_attr_uint = author_id
    rt_attr_uint = forum_id
    rt_attr_timestamp = post_date
}

This example shows running a full-text query filtered by author_id, forum_id and sorted by post_date.

SQL
select * from forum where author_id=123 and forum_id in (1,3,7) order by post_date desc
JSON
POST /search
{
  "index": "forum",
  "query":
  {
    "match_all": {},
    "bool":
    {
      "must":
      [
        { "equals": { "author_id": 123 } },
        { "in": { "forum_id": [1,3,7] } }
      ]
    }
  },
  "sort": [ { "post_date": "desc" } ]
}
PHP
$client->search([
        'index' => 'forum',
        'query' =>
        [
            'match_all' => [],
            'bool' => [
                'must' => [
                    'equals' => ['author_id' => 123],
                    'in' => [
                        'forum_id' => [
                            1,3,7
                        ]
                    ]
                ]
            ]
        ],
        'sort' => [
        ['post_date' => 'desc']
    ]
]);
Python
searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]})
javascript
res = await searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]});
java
HashMap<String,Object> filters = new HashMap<String,Object>(){{
    put("must", new HashMap<String,Object>(){{
        put("equals",new HashMap<String,Integer>(){{
            put("author_id",123);
        }});
        put("in",
            new HashMap<String,Object>(){{
                put("forum_id",new int[] {1,3,7});
        }});
    }});
}};
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("bool",filters);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("post_date","desc");}});
}});
SearchResponse searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
var boolFilter = new BoolFilter();
boolFilter.Must = new List<Object> {
    new EqualsFilter("author_id", 123),
    new InFilter("forum_id", new List<Object> {1,3,7})
};
searchRequest.AttrFilter = boolFilter;
searchRequest.Sort = new List<Object> { new SortOrder("post_date", SortOrder.OrderEnum.Desc) };
var searchResponse = searchApi.Search(searchRequest);

Row-wise and columnar attribute storages

Manticore supports two types of attribute storages:

As can be understood from their names, they store data differently. The traditional row-wise storage:

With the columnar storage:

The columnar storage was designed to handle large data volume that does not fit into RAM, so the recommendations are:

How to switch between the storages

The traditional row-wise storage is the default, so if you want everything to be stored in a row-wise fashion, you don't need to do anything when you create a table.

To enable the columnar storage you need to:

create table tbl(title text, type int, price float engine='rowwise') engine='columnar'
create table tbl(title text, type int, price float engine='columnar');

or

create table tbl(title text, type int, price float engine='columnar') engine='rowwise';

Below is the list of data types supported by Manticore Search:

Document ID

The document identifier is a mandatory attribute, and document IDs must be unique 64-bit unsigned integers. Document IDs can be explicitly specified, but if not, they are still enabled. Document IDs cannot be updated. Note that when retrieving document IDs, they are treated as signed 64-bit integers, which means they may be negative. Use the UINT64() function to cast them to unsigned 64-bit integers if necessary.

Explicit ID
When you create a table, you can specify ID explicitly, but no matter what data type you use, it will be always as said previously - a signed 64-bit integer.
CREATE TABLE tbl(id bigint, content text);
DESC tbl;
+---------+--------+----------------+
| Field   | Type   | Properties     |
+---------+--------+----------------+
| id      | bigint |                |
| content | text   | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)
Implicit ID
You can also omit specifying ID at all, it will be enabled automatically.
CREATE TABLE tbl(content text);
DESC tbl;
+---------+--------+----------------+
| Field   | Type   | Properties     |
+---------+--------+----------------+
| id      | bigint |                |
| content | text   | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)

Character data types

General syntax:

string|text [stored|attribute] [indexed]

Properties:

  1. indexed - full-text indexed (can be used in full-text queries)
  2. stored - stored in a docstore (stored on disk, not in RAM, lazy read)
  3. attribute - makes it a string attribute (can sort/group by it)

Specifying at least one property overrides all the default ones (see below), i.e., if you decide to use a custom combination of properties, you need to list all the properties you want.

No properties specified:

string and text are aliases, but if you don’t specify any properties, they by default mean different things:

Text

The text (just text or text/string indexed) data type forms the full-text part of the table. Text fields are indexed and can be searched for keywords.

Text is passed through an analyzer pipeline that converts the text to words, applies morphology transformations, etc. Eventually, a full-text table (a special data structure that enables quick searches for a keyword) gets built from that text.

Full-text fields can only be used in the MATCH() clause and cannot be used for sorting or aggregation. Words are stored in an inverted index along with references to the fields they belong to and positions in the field. This allows searching a word inside each field and using advanced operators like proximity. By default, the original text of the fields is both indexed and stored in document storage. It means that the original text can be returned with the query results and used in search result highlighting.

SQL
CREATE TABLE products(title text);
JSON
POST /cli -d "CREATE TABLE products(title text)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text']
]);
Python
utilsApi.sql('CREATE TABLE products(title text)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text)');
java
utilsApi.sql("CREATE TABLE products(title text)");
C#
utilsApi.Sql("CREATE TABLE products(title text)");
config
table products
{
    type = rt
    path = products

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title

    # this option needs to be specified for the field to be stored
    stored_fields = title
}

This behavior can be overridden by explicitly specifying that the text is only indexed.

SQL
CREATE TABLE products(title text indexed);
JSON
POST /cli -d "CREATE TABLE products(title text indexed)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text','options'=>['indexed']]
]);
Python
utilsApi.sql('CREATE TABLE products(title text indexed)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text indexed)');
java
utilsApi.sql("CREATE TABLE products(title text indexed)");
C#
utilsApi.Sql("CREATE TABLE products(title text indexed)");
config
table products
{
    type = rt
    path = products

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title
}

Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (e.g. "title" and "abstract" only). You can have up to 256 full-text fields.

SQL
select * from products where match('@title first');
JSON
POST /search
{
    "index": "products",
    "query":
    {
        "match": { "title": "first" }
    }
}
PHP
$index->setName('products')->search('@title')->get();
Python
searchApi.search({"index":"products","query":{"match":{"title":"first"}}})
javascript
res = await searchApi.search({"index":"products","query":{"match":{"title":"first"}}});
java
utilsApi.sql("CREATE TABLE products(title text indexed)");
C#
utilsApi.Sql("CREATE TABLE products(title text indexed)");

String

Unlike full-text fields, string attributes (just string or string/text attribute) are stored as they are received and cannot be used in full-text searches. Instead, they are returned in results, can be used in the WHERE clause for comparison filtering or REGEX, and can be used for sorting and aggregation. In general, it's not recommended to store large texts in string attributes, but use string attributes for metadata like names, titles, tags, keys.

If you want to also index the string attribute, you can specify both as string attribute indexed. It will allow full-text searching and works as an attribute.

SQL
CREATE TABLE products(title text, keys string);
JSON
POST /cli -d "CREATE TABLE products(title text, keys string)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'keys'=>['type'=>'string']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, keys string)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, keys string)');
java
utilsApi.sql("CREATE TABLE products(title text, keys string)");
C#
utilsApi.Sql("CREATE TABLE products(title text, keys string)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_string = keys
}
More You can create a full-text field that is also stored as a string attribute. This approach creates a full-text field and a string attribute that have the same name. Note that you can't add a `stored` property to store the data as a string attribute and in the document storage at the same time.
SQL
`string attribute indexed` means that we're working with a string data type that is stored as an attribute and indexed as a full-text field.
CREATE TABLE products ( title string attribute indexed );
JSON
POST /cli -d "CREATE TABLE products ( title string attribute indexed )"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'string','options'=>['indexed','attribute']]
]);
Python
utilsApi.sql('CREATE TABLE products ( title string attribute indexed )')
javascript
res = await utilsApi.sql('CREATE TABLE products ( title string attribute indexed )');
java
utilsApi.sql("CREATE TABLE products ( title string attribute indexed )");
C#
utilsApi.Sql("CREATE TABLE products ( title string attribute indexed )");
config
table products
{
    type = rt
    path = products

    rt_field = title
    rt_attr_string = title
}

Integer

Integer type allows storing 32 bit unsigned integer values.

SQL
CREATE TABLE products(title text, price int);
JSON
POST /cli -d "CREATE TABLE products(title text, price int)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'int']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, price int)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price int)');
java
utilsApi.sql("CREATE TABLE products(title text, price int)");
C#
utilsApi.Sql("CREATE TABLE products(title text, price int)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_uint = type
}

Integers can be stored in shorter sizes than 32-bit by specifying a bit count. For example, if we want to store a numeric value which we know is not going to be bigger than 8, the type can be defined as bit(3). Bitcount integers perform slower than the full-size ones, but they require less RAM. They are saved in 32-bit chunks, so in order to save space, they should be grouped at the end of attribute definitions (otherwise a bitcount integer between 2 full-size integers will occupy 32 bits as well).

SQL
CREATE TABLE products(title text, flags bit(3), tags bit(2) );
JSON
POST /cli -d "CREATE TABLE products(title text, flags bit(3), tags bit(2))"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'flags'=>['type'=>'bit(3)'],
    'tags'=>['type'=>'bit(2)']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ');
java
utilsApi.sql("CREATE TABLE products(title text, flags bit(3), tags bit(2)");
C#
utilsApi.Sql("CREATE TABLE products(title text, flags bit(3), tags bit(2)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_uint = flags:3
    rt_attr_uint = tags:2
}

Big Integer

Big integers (bigint) are 64-bit wide signed integers.

SQL
CREATE TABLE products(title text, price bigint );
JSON
POST /cli -d "CREATE TABLE products(title text, price bigint)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'bigint']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, price bigint )')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price bigint )');
java
utilsApi.sql("CREATE TABLE products(title text, price bigint )");
C#
utilsApi.Sql("CREATE TABLE products(title text, price bigint )");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_bigint = type
}

Boolean

Declares a boolean attribute. It's equivalent to an integer attribute with bit count of 1.

SQL
CREATE TABLE products(title text, sold bool );
JSON
POST /cli -d "CREATE TABLE products(title text, sold bool)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'sold'=>['type'=>'bool']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, sold bool )')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, sold bool )');
java
utilsApi.sql("CREATE TABLE products(title text, sold bool )");
C#
utilsApi.Sql("CREATE TABLE products(title text, sold bool )");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_bool = sold
}

Timestamps

Timestamp type represents unix timestamps which is stored as a 32-bit integer. The difference is that time and date functions are available for the timestamp type.

SQL
CREATE TABLE products(title text, date timestamp);
JSON
POST /cli -d "CREATE TABLE products(title text, date timestamp)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'date'=>['type'=>'timestamp']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, date timestamp)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, date timestamp)');
java
utilsApi.sql("CREATE TABLE products(title text, date timestamp)");
C#
utilsApi.Sql("CREATE TABLE products(title text, date timestamp)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_timestamp = date
}

Float

Real numbers are stored as 32-bit IEEE 754 single precision floats.

SQL
CREATE TABLE products(title text, coeff float);
JSON
POST /cli -d "CREATE TABLE products(title text, coeff float)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'coeff'=>['type'=>'float']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, coeff float)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, coeff float)');
java
utilsApi.sql("CREATE TABLE products(title text, coeff float)");
C#
utilsApi.Sql("CREATE TABLE products(title text, coeff float)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_float = coeff
}

Unlike integer types, comparing two floating-point numbers for equality is not recommended due to potential rounding errors. A more reliable approach is to use a near-equal comparison, by checking the absolute error margin.

SQL
select abs(a-b)<=0.00001 from products
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "eps": "abs(a-b)" }
}
PHP
$index->setName('products')->search('')->expression('eps','abs(a-b)')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}},"expressions":{"eps":"abs(a-b)"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"eps":"abs(a-b)"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("ebs","abs(a-b)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Expressions = new List<Object>{
    new Dictionary<string, string> { {"ebs", "abs(a-b)"} }
};
var searchResponse = searchApi.Search(searchRequest);

Another alternative, which can also be used to perform IN(attr,val1,val2,val3) is to compare floats as integers by choosing a multiplier factor and convert the floats to integers in operations. The following example illustrates modifying IN(attr,2.0,2.5,3.5) to work with integer values.

SQL
select in(ceil(attr*100),200,250,350) from products
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "inc": "in(ceil(attr*100),200,250,350)" }
}
PHP
$index->setName('products')->search('')->expression('inc','in(ceil(attr*100),200,250,350)')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("inc","in(ceil(attr*100),200,250,350)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Expressions = new List<Object> {
    new Dictionary<string, string> { {"ebs", "in(ceil(attr*100),200,250,350)"} }
};
var searchResponse = searchApi.Search(searchRequest);

JSON

This data type allows storing JSON objects, which is useful for storing schema-less data. However, it is not supported by columnar storage. However, it can be stored in traditional storage, as it's possible to combine both storage types in the same table.

SQL
CREATE TABLE products(title text, data json);
JSON
POST /cli -d "CREATE TABLE products(title text, data json)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'data'=>['type'=>'json']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, data json)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, data json)');
java
utilsApi.sql'CREATE TABLE products(title text, data json)');
C#
utilsApi.Sql'CREATE TABLE products(title text, data json)');
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_json = data
}

JSON properties can be used in most operations. There are also special functions such as ALL(), ANY(), GREATEST(), LEAST() and INDEXOF() that allow traversal of property arrays.

SQL
select indexof(x>2 for x in data.intarray) from products
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "idx": "indexof(x>2 for x in data.intarray)" }
}
PHP
$index->setName('products')->search('')->expression('idx','indexof(x>2 for x in data.intarray)')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Expressions = new List<Object> {
    new Dictionary<string, string> { {"idx", "indexof(x>2 for x in data.intarray)"} }
};
var searchResponse = searchApi.Search(searchRequest);

Text properties are treated the same as strings, so it's not possible to use them in full-text match expressions. However, string functions such as REGEX() can be used.

SQL
select regex(data.name, 'est') as c from products where c>0
JSON
POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "range": { "c": { "gt": 0 } } }
  },
  "expressions": { "c": "regex(data.name, 'est')" }
}
PHP
$index->setName('products')->search('')->expression('idx',"regex(data.name, 'est')")->filter('c','gt',0)->get();
Python
searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("range", new HashMap<String,Object>(){{
    put("c", new HashMap<String,Object>(){{
        put("gt",0);
    }});
}});
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
var rangeFilter = new RangeFilter("c");
rangeFilter.Gt = 0;
searchRequest.AttrFilter = rangeFilter;
searchRequest.Expressions = new List<Object> {
    new Dictionary<string, string> { {"idx", "indexof(x>2 for x in data.intarray)"} }
};
var searchResponse = searchApi.Search(searchRequest);

In the case of JSON properties, enforcing data type may be required for proper functionality in certain situations. For example, when working with float values, DOUBLE() must be used for proper sorting.

SQL
select * from products order by double(data.myfloat) desc
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "sort": [ { "double(data.myfloat)": { "order": "desc"} } ]
}
PHP
$index->setName('products')->search('')->sort('double(data.myfloat)','desc')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("double(data.myfloat)",new HashMap<String,String>(){{ put("order","desc");}});}});
}});
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Sort = new List<Object> {
    new SortOrder("double(data.myfloat)", SortOrder.OrderEnum.Desc)
};
var searchResponse = searchApi.Search(searchRequest);

Float vector

Float vector attributes allow storing variable-length lists of floats. It's important to note that this concept differs from multi-valued attributes. Multi-valued attributes (MVAs) are essentially sets; they do not preserve value order, and duplicate values are not retained. In contrast, float vectors perform no additional processing on values during insertion.

Float vector attributes can be used in k-nearest neighbor searches; see KNN search.

** Currently, float_vector fields can only be utilized in KNN search within real-time tables and the data type is not supported in any other functions or expressions, nor is it supported in plain tables. **

SQL
CREATE TABLE products(title text, image_vector float_vector);
JSON
POST /cli -d "CREATE TABLE products(title text, image_vector float_vector)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'image_vector'=>['type'=>'float_vector']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, image_vector float_vector)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, image_vector float_vector)');
java
utilsApi.sql("CREATE TABLE products(title text, image_vector float_vector)");
C#
utilsApi.Sql("CREATE TABLE products(title text, image_vector float_vector)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_float_vector = image_vector
}

Multi-value integer (MVA)

Multi-value attributes allow storing variable-length lists of 32-bit unsigned integers. This can be useful for storing one-to-many numeric values, such as tags, product categories, and properties.

SQL
CREATE TABLE products(title text, product_codes multi);
JSON
POST /cli -d "CREATE TABLE products(title text, product_codes multi)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'product_codes'=>['type'=>'multi']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, product_codes multi)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, product_codes multi)');
java
utilsApi.sql("CREATE TABLE products(title text, product_codes multi)");
C#
utilsApi.Sql("CREATE TABLE products(title text, product_codes multi)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_multi = product_codes
}

It supports filtering and aggregation, but not sorting. Filtering can be done using a condition that requires at least one element to pass (using ANY()) or all elements (ALL()) to pass.

SQL
select * from products where any(product_codes)=3
JSON
POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "equals" : { "any(product_codes)": 3 }
  }
}
PHP
$index->setName('products')->search('')->filter('any(product_codes)','equals',3)->get();
Python
searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})'
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("equals",new HashMap<String,Integer>(){{
     put("any(product_codes)",3);
}});
searchRequest.setQuery(query);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.AttrFilter = new EqualsFilter("any(product_codes)", 3);
var searchResponse = searchApi.Search(searchRequest);

Information like least or greatest element and length of the list can be extracted. An example shows ordering by the least element of a multi-value attribute.

SQL
select least(product_codes) l from products order by l asc
JSON
POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "sort": [ { "product_codes":{ "order":"asc", "mode":"min" } } ]
  }
}
PHP
$index->setName('products')->search('')->sort('product_codes','asc','min')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("product_codes",new HashMap<String,String>(){{ put("order","asc");put("mode","min");}});}});
}});
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Sort = new List<Object> {
    new SortMVA("product_codes", SortOrder.OrderEnum.Asc, SortMVA.ModeEnum.Min)
};
searchResponse = searchApi.search(searchRequest);

When grouping by a multi-value attribute, a document will contribute to as many groups as there are different values associated with that document. For instance, if a collection contains exactly one document having a 'product_codes' multi-value attribute with values 5, 7, and 11, grouping on 'product_codes' will produce 3 groups with COUNT(*)equal to 1 and GROUPBY() key values of 5, 7, and 11, respectively. Also, note that grouping by multi-value attributes may lead to duplicate documents in the result set because each document can participate in many groups.

SQL
insert into products values ( 1, 'doc one', (5,7,11) );
select id, count(*), groupby() from products group by product_codes;
Query OK, 1 row affected (0.00 sec)

+------+----------+-----------+
| id   | count(*) | groupby() |
+------+----------+-----------+
|    1 |        1 |        11 |
|    1 |        1 |         7 |
|    1 |        1 |         5 |
+------+----------+-----------+
3 rows in set (0.00 sec)

The order of the numbers inserted as values of multivalued attributes is not preserved. Values are stored internally as a sorted set.

SQL
insert into product values (1,'first',(4,2,1,3));
select * from products;
Query OK, 1 row affected (0.00 sec)

+------+---------------+-------+
| id   | product_codes | title |
+------+---------------+-------+
|    1 | 1,2,3,4       | first |
+------+---------------+-------+
1 row in set (0.01 sec)
JSON
POST /insert
{
    "index":"products",
    "id":1,
    "doc":
    {
        "title":"first",
        "product_codes":[4,2,1,3]
    }
}

POST /search
{
  "index": "products",
  "query": { "match_all": {} }
}
{
   "_index":"products",
   "_id":1,
   "created":true,
   "result":"created",
   "status":201
}

{
   "took":0,
   "timed_out":false,
   "hits":{
      "total":1,
      "hits":[
         {
            "_id":"1",
            "_score":1,
            "_source":{
               "product_codes":[
                  1,
                  2,
                  3,
                  4
               ],
               "title":"first"
            }
         }
      ]
   }
}
PHP
$index->addDocument([
    "title"=>"first",
    "product_codes"=>[4,2,1,3]
]);
$index->search('')-get();
Array
(
    [_index] => products
    [_id] => 1
    [created] => 1
    [result] => created
    [status] => 201
)
Array
(
    [took] => 0
    [timed_out] =>
    [hits] => Array
        (
            [total] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_id] => 1
                            [_score] => 1
                            [_source] => Array
                                (
                                    [product_codes] => Array
                                        (
                                            [0] => 1
                                            [1] => 2
                                            [2] => 3
                                            [3] => 4
                                        )

                                    [title] => first
                                )
                        )
                )
        )
)
Python
indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}})
searchApi.search({"index":"products","query":{"match_all":{}}})
{'created': True,
 'found': None,
 'id': 1,
 'index': 'products',
 'result': 'created'}
{'hits': {'hits': [{u'_id': u'1',
                    u'_score': 1,
                    u'_source': {u'product_codes': [1, 2, 3, 4],
                                 u'title': u'first'}}],
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 29}
javascript
await indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}});
res = await searchApi.search({"index":"products","query":{"match_all":{}}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":1,"_source":{"product_codes":[1,2,3,4],"title":"first"}}]}}
java
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","first");
    put("product_codes",new int[] {4,2,1,3});
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("products");
searchRequest.setQuery(query);
SearchResponse searchResponse = searchApi.search(searchRequest);
System.out.println(searchResponse.toString() );
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=1, _score=1, _source={product_codes=[1, 2, 3, 4], title=first}}]
        aggregations: null
    }
    profile: null
}
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "first");
doc.Add("product_codes", new List<Object> {4,2,1,3});
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
object query =  new { match_all=null };
var searchRequest = new SearchRequest("products", query);
var searchResponse = searchApi.Search(searchRequest);
Console.WriteLine(searchResponse.ToString())
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=1, _score=1, _source={product_codes=[1, 2, 3, 4], title=first}}]
        aggregations: null
    }
    profile: null
}

Multi-value big integer

A data type that allows storing variable-length lists of 64-bit signed integers. It has the same functionality as multi-value integer.

SQL
CREATE TABLE products(title text, values multi64);
JSON
POST /cli -d "CREATE TABLE products(title text, values multi64)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'values'=>['type'=>'multi64']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, values multi64))')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, values multi64))');
java
utilsApi.sql("CREATE TABLE products(title text, values multi64))");
C#
utilsApi.Sql("CREATE TABLE products(title text, values multi64))");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_multi_64 = values
}

Columnar attribute properties

When you use the columnar storage you can specify the following properties for the attributes.

fast_fetch

By default, Manticore Columnar storage stores all attributes in a columnar fashion, as well as in a special docstore row by row. This enables fast execution of queries like SELECT * FROM ..., especially when fetching a large number of records at once. However, if you are sure that you do not need it or wish to save disk space, you can disable it by specifying fast_fetch='0' when creating a table or (if you are defining a table in a config) by using columnar_no_fast_fetch as shown in the following example.

RT mode
create table t(a int, b int fast_fetch='0') engine='columnar'; desc t;
+-------+--------+---------------------+
| Field | Type   | Properties          |
+-------+--------+---------------------+
| id    | bigint | columnar fast_fetch |
| a     | uint   | columnar fast_fetch |
| b     | uint   | columnar            |
+-------+--------+---------------------+
3 rows in set (0.00 sec)
Plain mode
source min {
    type = mysql
    sql_host = localhost
    sql_user = test
    sql_pass =
    sql_db = test
    sql_query = select 1, 1 a, 1 b
    sql_attr_uint = a
    sql_attr_uint = b
}

table tbl {
    path = tbl/col
    source = min
    columnar_attrs = *
    columnar_no_fast_fetch = b
}
+-------+--------+---------------------+
| Field | Type   | Properties          |
+-------+--------+---------------------+
| id    | bigint | columnar fast_fetch |
| a     | uint   | columnar fast_fetch |
| b     | uint   | columnar            |
+-------+--------+---------------------+
¶ 6.1.1

Data types

Full-text fields and attributes

Manticore's data types can be split into two categories: full-text fields and attributes.

Full-text fields

Full-text fields:

Full-text fields are represented by the data type text. All other data types are called "attributes".

Attributes

Attributes are non-full-text values associated with each document that can be used to perform non-full-text filtering, sorting and grouping during a search.

It is often desired to process full-text search results based not only on matching document ID and its rank, but also on a number of other per-document values. For example, one might need to sort news search results by date and then relevance, or search through products within a specified price range, or limit a blog search to posts made by selected users, or group results by month. To do this efficiently, Manticore enables not only full-text fields, but also additional attributes to be added to each document. These attributes can be used to filter, sort, or group full-text matches, or to search only by attributes.

The attributes, unlike full-text fields, are not full-text indexed. They are stored in the table, but it is not possible to search them as full-text.

A good example for attributes would be a forum posts table. Assume that only the title and content fields need to be full-text searchable - but that sometimes it is also required to limit search to a certain author or a sub-forum (i.e., search only those rows that have some specific values of author_id or forum_id); or to sort matches by post_date column; or to group matching posts by month of the post_date and calculate per-group match counts.

SQL
CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp);
JSON
POST /cli -d "CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('forum');
$index->create([
    'title'=>['type'=>'text'],
    'content'=>['type'=>'text'],
    'author_id'=>['type'=>'int'],
    'forum_id'=>['type'=>'int'],
    'post_date'=>['type'=>'timestamp']
]);
Python
utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)')
Javascript
res = await utilsApi.sql('CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)');
Java
utilsApi.sql("CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)");
C#
utilsApi.Sql("CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)");
config
table forum
{
    type = rt
    path = forum

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title
    rt_field = content

    # this option needs to be specified for the field to be stored
    stored_fields = title, content

    rt_attr_uint = author_id
    rt_attr_uint = forum_id
    rt_attr_timestamp = post_date
}

This example shows running a full-text query filtered by author_id, forum_id and sorted by post_date.

SQL
select * from forum where author_id=123 and forum_id in (1,3,7) order by post_date desc
JSON
POST /search
{
  "index": "forum",
  "query":
  {
    "match_all": {},
    "bool":
    {
      "must":
      [
        { "equals": { "author_id": 123 } },
        { "in": { "forum_id": [1,3,7] } }
      ]
    }
  },
  "sort": [ { "post_date": "desc" } ]
}
PHP
$client->search([
        'index' => 'forum',
        'query' =>
        [
            'match_all' => [],
            'bool' => [
                'must' => [
                    'equals' => ['author_id' => 123],
                    'in' => [
                        'forum_id' => [
                            1,3,7
                        ]
                    ]
                ]
            ]
        ],
        'sort' => [
        ['post_date' => 'desc']
    ]
]);
Python
searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]})
javascript
res = await searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]});
java
HashMap<String,Object> filters = new HashMap<String,Object>(){{
    put("must", new HashMap<String,Object>(){{
        put("equals",new HashMap<String,Integer>(){{
            put("author_id",123);
        }});
        put("in",
            new HashMap<String,Object>(){{
                put("forum_id",new int[] {1,3,7});
        }});
    }});
}};
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("bool",filters);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("post_date","desc");}});
}});
SearchResponse searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
var boolFilter = new BoolFilter();
boolFilter.Must = new List<Object> {
    new EqualsFilter("author_id", 123),
    new InFilter("forum_id", new List<Object> {1,3,7})
};
searchRequest.AttrFilter = boolFilter;
searchRequest.Sort = new List<Object> { new SortOrder("post_date", SortOrder.OrderEnum.Desc) };
var searchResponse = searchApi.Search(searchRequest);

Row-wise and columnar attribute storages

Manticore supports two types of attribute storages:

As can be understood from their names, they store data differently. The traditional row-wise storage:

With the columnar storage:

The columnar storage was designed to handle large data volume that does not fit into RAM, so the recommendations are:

How to switch between the storages

The traditional row-wise storage is the default, so if you want everything to be stored in a row-wise fashion, you don't need to do anything when you create a table.

To enable the columnar storage you need to:

create table tbl(title text, type int, price float engine='rowwise') engine='columnar'
create table tbl(title text, type int, price float engine='columnar');

or

create table tbl(title text, type int, price float engine='columnar') engine='rowwise';

Below is the list of data types supported by Manticore Search:

Document ID

The document identifier is a mandatory attribute, and document IDs must be unique 64-bit unsigned integers. Document IDs can be explicitly specified, but if not, they are still enabled. Document IDs cannot be updated. Note that when retrieving document IDs, they are treated as signed 64-bit integers, which means they may be negative. Use the UINT64() function to cast them to unsigned 64-bit integers if necessary.

Explicit ID
When you create a table, you can specify ID explicitly, but no matter what data type you use, it will be always as said previously - a signed 64-bit integer.
CREATE TABLE tbl(id bigint, content text);
DESC tbl;
+---------+--------+----------------+
| Field   | Type   | Properties     |
+---------+--------+----------------+
| id      | bigint |                |
| content | text   | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)
Implicit ID
You can also omit specifying ID at all, it will be enabled automatically.
CREATE TABLE tbl(content text);
DESC tbl;
+---------+--------+----------------+
| Field   | Type   | Properties     |
+---------+--------+----------------+
| id      | bigint |                |
| content | text   | indexed stored |
+---------+--------+----------------+
2 rows in set (0.00 sec)

Character data types

General syntax:

string|text [stored|attribute] [indexed]

Properties:

  1. indexed - full-text indexed (can be used in full-text queries)
  2. stored - stored in a docstore (stored on disk, not in RAM, lazy read)
  3. attribute - makes it a string attribute (can sort/group by it)

Specifying at least one property overrides all the default ones (see below), i.e., if you decide to use a custom combination of properties, you need to list all the properties you want.

No properties specified:

string and text are aliases, but if you don’t specify any properties, they by default mean different things:

Text

The text (just text or text/string indexed) data type forms the full-text part of the table. Text fields are indexed and can be searched for keywords.

Text is passed through an analyzer pipeline that converts the text to words, applies morphology transformations, etc. Eventually, a full-text table (a special data structure that enables quick searches for a keyword) gets built from that text.

Full-text fields can only be used in the MATCH() clause and cannot be used for sorting or aggregation. Words are stored in an inverted index along with references to the fields they belong to and positions in the field. This allows searching a word inside each field and using advanced operators like proximity. By default, the original text of the fields is both indexed and stored in document storage. It means that the original text can be returned with the query results and used in search result highlighting.

SQL
CREATE TABLE products(title text);
JSON
POST /cli -d "CREATE TABLE products(title text)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text']
]);
Python
utilsApi.sql('CREATE TABLE products(title text)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text)');
java
utilsApi.sql("CREATE TABLE products(title text)");
C#
utilsApi.Sql("CREATE TABLE products(title text)");
config
table products
{
    type = rt
    path = products

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title

    # this option needs to be specified for the field to be stored
    stored_fields = title
}

This behavior can be overridden by explicitly specifying that the text is only indexed.

SQL
CREATE TABLE products(title text indexed);
JSON
POST /cli -d "CREATE TABLE products(title text indexed)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text','options'=>['indexed']]
]);
Python
utilsApi.sql('CREATE TABLE products(title text indexed)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text indexed)');
java
utilsApi.sql("CREATE TABLE products(title text indexed)");
C#
utilsApi.Sql("CREATE TABLE products(title text indexed)");
config
table products
{
    type = rt
    path = products

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title
}

Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (e.g. "title" and "abstract" only). You can have up to 256 full-text fields.

SQL
select * from products where match('@title first');
JSON
POST /search
{
    "index": "products",
    "query":
    {
        "match": { "title": "first" }
    }
}
PHP
$index->setName('products')->search('@title')->get();
Python
searchApi.search({"index":"products","query":{"match":{"title":"first"}}})
javascript
res = await searchApi.search({"index":"products","query":{"match":{"title":"first"}}});
java
utilsApi.sql("CREATE TABLE products(title text indexed)");
C#
utilsApi.Sql("CREATE TABLE products(title text indexed)");

String

Unlike full-text fields, string attributes (just string or string/text attribute) are stored as they are received and cannot be used in full-text searches. Instead, they are returned in results, can be used in the WHERE clause for comparison filtering or REGEX, and can be used for sorting and aggregation. In general, it's not recommended to store large texts in string attributes, but use string attributes for metadata like names, titles, tags, keys.

If you want to also index the string attribute, you can specify both as string attribute indexed. It will allow full-text searching and works as an attribute.

SQL
CREATE TABLE products(title text, keys string);
JSON
POST /cli -d "CREATE TABLE products(title text, keys string)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'keys'=>['type'=>'string']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, keys string)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, keys string)');
java
utilsApi.sql("CREATE TABLE products(title text, keys string)");
C#
utilsApi.Sql("CREATE TABLE products(title text, keys string)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_string = keys
}
More You can create a full-text field that is also stored as a string attribute. This approach creates a full-text field and a string attribute that have the same name. Note that you can't add a `stored` property to store the data as a string attribute and in the document storage at the same time.
SQL
`string attribute indexed` means that we're working with a string data type that is stored as an attribute and indexed as a full-text field.
CREATE TABLE products ( title string attribute indexed );
JSON
POST /cli -d "CREATE TABLE products ( title string attribute indexed )"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'string','options'=>['indexed','attribute']]
]);
Python
utilsApi.sql('CREATE TABLE products ( title string attribute indexed )')
javascript
res = await utilsApi.sql('CREATE TABLE products ( title string attribute indexed )');
java
utilsApi.sql("CREATE TABLE products ( title string attribute indexed )");
C#
utilsApi.Sql("CREATE TABLE products ( title string attribute indexed )");
config
table products
{
    type = rt
    path = products

    rt_field = title
    rt_attr_string = title
}

Integer

Integer type allows storing 32 bit unsigned integer values.

SQL
CREATE TABLE products(title text, price int);
JSON
POST /cli -d "CREATE TABLE products(title text, price int)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'int']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, price int)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price int)');
java
utilsApi.sql("CREATE TABLE products(title text, price int)");
C#
utilsApi.Sql("CREATE TABLE products(title text, price int)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_uint = type
}

Integers can be stored in shorter sizes than 32-bit by specifying a bit count. For example, if we want to store a numeric value which we know is not going to be bigger than 8, the type can be defined as bit(3). Bitcount integers perform slower than the full-size ones, but they require less RAM. They are saved in 32-bit chunks, so in order to save space, they should be grouped at the end of attribute definitions (otherwise a bitcount integer between 2 full-size integers will occupy 32 bits as well).

SQL
CREATE TABLE products(title text, flags bit(3), tags bit(2) );
JSON
POST /cli -d "CREATE TABLE products(title text, flags bit(3), tags bit(2))"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'flags'=>['type'=>'bit(3)'],
    'tags'=>['type'=>'bit(2)']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, flags bit(3), tags bit(2) ');
java
utilsApi.sql("CREATE TABLE products(title text, flags bit(3), tags bit(2)");
C#
utilsApi.Sql("CREATE TABLE products(title text, flags bit(3), tags bit(2)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_uint = flags:3
    rt_attr_uint = tags:2
}

Big Integer

Big integers (bigint) are 64-bit wide signed integers.

SQL
CREATE TABLE products(title text, price bigint );
JSON
POST /cli -d "CREATE TABLE products(title text, price bigint)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'bigint']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, price bigint )')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price bigint )');
java
utilsApi.sql("CREATE TABLE products(title text, price bigint )");
C#
utilsApi.Sql("CREATE TABLE products(title text, price bigint )");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_bigint = type
}

Boolean

Declares a boolean attribute. It's equivalent to an integer attribute with bit count of 1.

SQL
CREATE TABLE products(title text, sold bool );
JSON
POST /cli -d "CREATE TABLE products(title text, sold bool)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'sold'=>['type'=>'bool']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, sold bool )')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, sold bool )');
java
utilsApi.sql("CREATE TABLE products(title text, sold bool )");
C#
utilsApi.Sql("CREATE TABLE products(title text, sold bool )");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_bool = sold
}

Timestamps

Timestamp type represents unix timestamps which is stored as a 32-bit integer. The difference is that time and date functions are available for the timestamp type.

SQL
CREATE TABLE products(title text, date timestamp);
JSON
POST /cli -d "CREATE TABLE products(title text, date timestamp)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'date'=>['type'=>'timestamp']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, date timestamp)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, date timestamp)');
java
utilsApi.sql("CREATE TABLE products(title text, date timestamp)");
C#
utilsApi.Sql("CREATE TABLE products(title text, date timestamp)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_timestamp = date
}

Float

Real numbers are stored as 32-bit IEEE 754 single precision floats.

SQL
CREATE TABLE products(title text, coeff float);
JSON
POST /cli -d "CREATE TABLE products(title text, coeff float)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'coeff'=>['type'=>'float']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, coeff float)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, coeff float)');
java
utilsApi.sql("CREATE TABLE products(title text, coeff float)");
C#
utilsApi.Sql("CREATE TABLE products(title text, coeff float)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_float = coeff
}

Unlike integer types, comparing two floating-point numbers for equality is not recommended due to potential rounding errors. A more reliable approach is to use a near-equal comparison, by checking the absolute error margin.

SQL
select abs(a-b)<=0.00001 from products
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "eps": "abs(a-b)" }
}
PHP
$index->setName('products')->search('')->expression('eps','abs(a-b)')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}},"expressions":{"eps":"abs(a-b)"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"eps":"abs(a-b)"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("ebs","abs(a-b)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Expressions = new List<Object>{
    new Dictionary<string, string> { {"ebs", "abs(a-b)"} }
};
var searchResponse = searchApi.Search(searchRequest);

Another alternative, which can also be used to perform IN(attr,val1,val2,val3) is to compare floats as integers by choosing a multiplier factor and convert the floats to integers in operations. The following example illustrates modifying IN(attr,2.0,2.5,3.5) to work with integer values.

SQL
select in(ceil(attr*100),200,250,350) from products
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "inc": "in(ceil(attr*100),200,250,350)" }
}
PHP
$index->setName('products')->search('')->expression('inc','in(ceil(attr*100),200,250,350)')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("inc","in(ceil(attr*100),200,250,350)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Expressions = new List<Object> {
    new Dictionary<string, string> { {"ebs", "in(ceil(attr*100),200,250,350)"} }
};
var searchResponse = searchApi.Search(searchRequest);

JSON

This data type allows storing JSON objects, which is useful for storing schema-less data. However, it is not supported by columnar storage. However, it can be stored in traditional storage, as it's possible to combine both storage types in the same table.

SQL
CREATE TABLE products(title text, data json);
JSON
POST /cli -d "CREATE TABLE products(title text, data json)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'data'=>['type'=>'json']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, data json)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, data json)');
java
utilsApi.sql'CREATE TABLE products(title text, data json)');
C#
utilsApi.Sql'CREATE TABLE products(title text, data json)');
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_json = data
}

JSON properties can be used in most operations. There are also special functions such as ALL(), ANY(), GREATEST(), LEAST() and INDEXOF() that allow traversal of property arrays.

SQL
select indexof(x>2 for x in data.intarray) from products
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "idx": "indexof(x>2 for x in data.intarray)" }
}
PHP
$index->setName('products')->search('')->expression('idx','indexof(x>2 for x in data.intarray)')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Expressions = new List<Object> {
    new Dictionary<string, string> { {"idx", "indexof(x>2 for x in data.intarray)"} }
};
var searchResponse = searchApi.Search(searchRequest);

Text properties are treated the same as strings, so it's not possible to use them in full-text match expressions. However, string functions such as REGEX() can be used.

SQL
select regex(data.name, 'est') as c from products where c>0
JSON
POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "range": { "c": { "gt": 0 } } }
  },
  "expressions": { "c": "regex(data.name, 'est')" }
}
PHP
$index->setName('products')->search('')->expression('idx',"regex(data.name, 'est')")->filter('c','gt',0)->get();
Python
searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("range", new HashMap<String,Object>(){{
    put("c", new HashMap<String,Object>(){{
        put("gt",0);
    }});
}});
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
var rangeFilter = new RangeFilter("c");
rangeFilter.Gt = 0;
searchRequest.AttrFilter = rangeFilter;
searchRequest.Expressions = new List<Object> {
    new Dictionary<string, string> { {"idx", "indexof(x>2 for x in data.intarray)"} }
};
var searchResponse = searchApi.Search(searchRequest);

In the case of JSON properties, enforcing data type may be required for proper functionality in certain situations. For example, when working with float values, DOUBLE() must be used for proper sorting.

SQL
select * from products order by double(data.myfloat) desc
JSON
POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "sort": [ { "double(data.myfloat)": { "order": "desc"} } ]
}
PHP
$index->setName('products')->search('')->sort('double(data.myfloat)','desc')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("double(data.myfloat)",new HashMap<String,String>(){{ put("order","desc");}});}});
}});
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Sort = new List<Object> {
    new SortOrder("double(data.myfloat)", SortOrder.OrderEnum.Desc)
};
var searchResponse = searchApi.Search(searchRequest);

Float vector

Float vector attributes allow storing variable-length lists of floats. It's important to note that this concept differs from multi-valued attributes. Multi-valued attributes (MVAs) are essentially sets; they do not preserve value order, and duplicate values are not retained. In contrast, float vectors perform no additional processing on values during insertion.

Float vector attributes can be used in k-nearest neighbor searches; see KNN search.

** Currently, float_vector fields can only be utilized in KNN search within real-time tables and the data type is not supported in any other functions or expressions, nor is it supported in plain tables. **

SQL
CREATE TABLE products(title text, image_vector float_vector);
JSON
POST /cli -d "CREATE TABLE products(title text, image_vector float_vector)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'image_vector'=>['type'=>'float_vector']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, image_vector float_vector)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, image_vector float_vector)');
java
utilsApi.sql("CREATE TABLE products(title text, image_vector float_vector)");
C#
utilsApi.Sql("CREATE TABLE products(title text, image_vector float_vector)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_float_vector = image_vector
}

Multi-value integer (MVA)

Multi-value attributes allow storing variable-length lists of 32-bit unsigned integers. This can be useful for storing one-to-many numeric values, such as tags, product categories, and properties.

SQL
CREATE TABLE products(title text, product_codes multi);
JSON
POST /cli -d "CREATE TABLE products(title text, product_codes multi)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'product_codes'=>['type'=>'multi']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, product_codes multi)')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, product_codes multi)');
java
utilsApi.sql("CREATE TABLE products(title text, product_codes multi)");
C#
utilsApi.Sql("CREATE TABLE products(title text, product_codes multi)");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_multi = product_codes
}

It supports filtering and aggregation, but not sorting. Filtering can be done using a condition that requires at least one element to pass (using ANY()) or all elements (ALL()) to pass.

SQL
select * from products where any(product_codes)=3
JSON
POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "equals" : { "any(product_codes)": 3 }
  }
}
PHP
$index->setName('products')->search('')->filter('any(product_codes)','equals',3)->get();
Python
searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})'
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("equals",new HashMap<String,Integer>(){{
     put("any(product_codes)",3);
}});
searchRequest.setQuery(query);
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.AttrFilter = new EqualsFilter("any(product_codes)", 3);
var searchResponse = searchApi.Search(searchRequest);

Information like least or greatest element and length of the list can be extracted. An example shows ordering by the least element of a multi-value attribute.

SQL
select least(product_codes) l from products order by l asc
JSON
POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "sort": [ { "product_codes":{ "order":"asc", "mode":"min" } } ]
  }
}
PHP
$index->setName('products')->search('')->sort('product_codes','asc','min')->get();
Python
searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}})
javascript
res = await searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}});
java
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("product_codes",new HashMap<String,String>(){{ put("order","asc");put("mode","min");}});}});
}});
searchResponse = searchApi.search(searchRequest);
C#
object query =  new { match_all=null };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Sort = new List<Object> {
    new SortMVA("product_codes", SortOrder.OrderEnum.Asc, SortMVA.ModeEnum.Min)
};
searchResponse = searchApi.search(searchRequest);

When grouping by a multi-value attribute, a document will contribute to as many groups as there are different values associated with that document. For instance, if a collection contains exactly one document having a 'product_codes' multi-value attribute with values 5, 7, and 11, grouping on 'product_codes' will produce 3 groups with COUNT(*)equal to 1 and GROUPBY() key values of 5, 7, and 11, respectively. Also, note that grouping by multi-value attributes may lead to duplicate documents in the result set because each document can participate in many groups.

SQL
insert into products values ( 1, 'doc one', (5,7,11) );
select id, count(*), groupby() from products group by product_codes;
Query OK, 1 row affected (0.00 sec)

+------+----------+-----------+
| id   | count(*) | groupby() |
+------+----------+-----------+
|    1 |        1 |        11 |
|    1 |        1 |         7 |
|    1 |        1 |         5 |
+------+----------+-----------+
3 rows in set (0.00 sec)

The order of the numbers inserted as values of multivalued attributes is not preserved. Values are stored internally as a sorted set.

SQL
insert into product values (1,'first',(4,2,1,3));
select * from products;
Query OK, 1 row affected (0.00 sec)

+------+---------------+-------+
| id   | product_codes | title |
+------+---------------+-------+
|    1 | 1,2,3,4       | first |
+------+---------------+-------+
1 row in set (0.01 sec)
JSON
POST /insert
{
    "index":"products",
    "id":1,
    "doc":
    {
        "title":"first",
        "product_codes":[4,2,1,3]
    }
}

POST /search
{
  "index": "products",
  "query": { "match_all": {} }
}
{
   "_index":"products",
   "_id":1,
   "created":true,
   "result":"created",
   "status":201
}

{
   "took":0,
   "timed_out":false,
   "hits":{
      "total":1,
      "hits":[
         {
            "_id":"1",
            "_score":1,
            "_source":{
               "product_codes":[
                  1,
                  2,
                  3,
                  4
               ],
               "title":"first"
            }
         }
      ]
   }
}
PHP
$index->addDocument([
    "title"=>"first",
    "product_codes"=>[4,2,1,3]
]);
$index->search('')-get();
Array
(
    [_index] => products
    [_id] => 1
    [created] => 1
    [result] => created
    [status] => 201
)
Array
(
    [took] => 0
    [timed_out] =>
    [hits] => Array
        (
            [total] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_id] => 1
                            [_score] => 1
                            [_source] => Array
                                (
                                    [product_codes] => Array
                                        (
                                            [0] => 1
                                            [1] => 2
                                            [2] => 3
                                            [3] => 4
                                        )

                                    [title] => first
                                )
                        )
                )
        )
)
Python
indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}})
searchApi.search({"index":"products","query":{"match_all":{}}})
{'created': True,
 'found': None,
 'id': 1,
 'index': 'products',
 'result': 'created'}
{'hits': {'hits': [{u'_id': u'1',
                    u'_score': 1,
                    u'_source': {u'product_codes': [1, 2, 3, 4],
                                 u'title': u'first'}}],
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 29}
javascript
await indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}});
res = await searchApi.search({"index":"products","query":{"match_all":{}}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":1,"_source":{"product_codes":[1,2,3,4],"title":"first"}}]}}
java
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","first");
    put("product_codes",new int[] {4,2,1,3});
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("products");
searchRequest.setQuery(query);
SearchResponse searchResponse = searchApi.search(searchRequest);
System.out.println(searchResponse.toString() );
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=1, _score=1, _source={product_codes=[1, 2, 3, 4], title=first}}]
        aggregations: null
    }
    profile: null
}
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "first");
doc.Add("product_codes", new List<Object> {4,2,1,3});
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
object query =  new { match_all=null };
var searchRequest = new SearchRequest("products", query);
var searchResponse = searchApi.Search(searchRequest);
Console.WriteLine(searchResponse.ToString())
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=1, _score=1, _source={product_codes=[1, 2, 3, 4], title=first}}]
        aggregations: null
    }
    profile: null
}

Multi-value big integer

A data type that allows storing variable-length lists of 64-bit signed integers. It has the same functionality as multi-value integer.

SQL
CREATE TABLE products(title text, values multi64);
JSON
POST /cli -d "CREATE TABLE products(title text, values multi64)"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'values'=>['type'=>'multi64']
]);
Python
utilsApi.sql('CREATE TABLE products(title text, values multi64))')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, values multi64))');
java
utilsApi.sql("CREATE TABLE products(title text, values multi64))");
C#
utilsApi.Sql("CREATE TABLE products(title text, values multi64))");
config
table products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_multi_64 = values
}

Columnar attribute properties

When you use the columnar storage you can specify the following properties for the attributes.

fast_fetch

By default, Manticore Columnar storage stores all attributes in a columnar fashion, as well as in a special docstore row by row. This enables fast execution of queries like SELECT * FROM ..., especially when fetching a large number of records at once. However, if you are sure that you do not need it or wish to save disk space, you can disable it by specifying fast_fetch='0' when creating a table or (if you are defining a table in a config) by using columnar_no_fast_fetch as shown in the following example.

RT mode
create table t(a int, b int fast_fetch='0') engine='columnar'; desc t;
+-------+--------+---------------------+
| Field | Type   | Properties          |
+-------+--------+---------------------+
| id    | bigint | columnar fast_fetch |
| a     | uint   | columnar fast_fetch |
| b     | uint   | columnar            |
+-------+--------+---------------------+
3 rows in set (0.00 sec)
Plain mode
source min {
    type = mysql
    sql_host = localhost
    sql_user = test
    sql_pass =
    sql_db = test
    sql_query = select 1, 1 a, 1 b
    sql_attr_uint = a
    sql_attr_uint = b
}

table tbl {
    path = tbl/col
    source = min
    columnar_attrs = *
    columnar_no_fast_fetch = b
}
+-------+--------+---------------------+
| Field | Type   | Properties          |
+-------+--------+---------------------+
| id    | bigint | columnar fast_fetch |
| a     | uint   | columnar fast_fetch |
| b     | uint   | columnar            |
+-------+--------+---------------------+
¶ 6.2

Creating a local table

In Manticore Search, there are two ways to manage tables:

Online schema management (RT mode)

Real-time mode requires no table definition in the configuration file. However, the data_dir directive in the searchd section is mandatory. Index files are stored inside the data_dir.

Replication is only available in this mode.

You can use SQL commands such as CREATE TABLE, ALTER TABLE and DROP TABLE to create and modify table schema, and to drop it. This mode is particularly useful for real-time and percolate tables.

Table names are converted to lowercase when created.

Defining table schema in config (Plain mode)

In this mode, you can specify the table schema in the configuration file. Manticore reads this schema on startup and creates the table if it doesn't exist yet. This mode is particularly useful for plain tables that use data from an external storage.

To drop a table, remove it from the configuration file or remove the path setting and send a HUP signal to the server or restart it.

Table names are case-sensitive in this mode.

All table types are supported in this mode.

Table types and modes

Table type RT mode Plain mode
Real-time supported supported
Plain not supported supported
Percolate supported supported
Distributed supported supported
Template not supported supported
¶ 6.2.1

Real-time table

A real-time table is a main type of table in Manticore. It lets you add, update, and delete documents, and you can see these changes right away. You can set up a real-time Table in a configuration file or use commands like CREATE, UPDATE, DELETE, or ALTER.

Internally a real-time table consists of one or more plain tables called chunks. There are two kinds of chunks:

The size of the RAM chunk is controlled by the rt_mem_limit setting. Once this limit is reached, the RAM chunk is transferred to disk as a disk chunk. If there are too many disk chunks, Manticore combines some of them to improve performance.

Creating a real-time table:

You can create a new real-time table in two ways: by using the CREATE TABLE command or through the _mapping endpoint of the HTTP JSON API.

CREATE TABLE command:

You can use this command via both SQL and HTTP protocols:

SQL
CREATE TABLE products(title text, price float) morphology='stem_en';
Query OK, 0 rows affected (0.00 sec)
JSON
POST /cli -d "CREATE TABLE products(title text, price float)  morphology='stem_en'"
{
"total":0,
"error":"",
"warning":""
}
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'float'],
]);
Python
utilsApi.sql('CREATE TABLE forum(title text, price float)')
Javascript
res = await utilsApi.sql('CREATE TABLE forum(title text, price float)');
Java
utilsApi.sql("CREATE TABLE forum(title text, price float)");
C#
utilsApi.Sql("CREATE TABLE forum(title text, price float)");
CONFIG
table products {
  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
  stored_fields = title
}

_mapping API:

Alternatively, you can create a new table via the _mapping endpoint. This endpoint allows you to define an Elasticsearch-like table structure to be converted to a Manticore table.

The body of your request must have the following structure:

"properties"
{
  "FIELD_NAME_1":
  {
    "type": "FIELD_TYPE_1"
  },
  "FIELD_NAME_2":
  {
    "type": "FIELD_TYPE_2"
  },

  ...

  "FIELD_NAME_N":
  {
    "type": "FIELD_TYPE_M"
  }
}

When creating a table, Elasticsearch data types will be mapped to Manticore types according to the following rules:
- aggregate_metric => json
- binary => string
- boolean => bool
- byte => int
- completion => string
- date => timestamp
- date_nanos => bigint
- date_range => json
- dense_vector => json
- flattened => json
- flat_object => json
- float => float
- float_range => json
- geo_point => json
- geo_shape => json
- half_float => float
- histogram => json
- integer => int
- integer_range => json
- ip => string
- ip_range => json
- keyword => string
- knn_vector => float_vector
- long => bigint
- long_range => json
- match_only_text => text
- object => json
- point => json
- scaled_float => float
- search_as_you_type => text
- shape => json
- short => int
- text => text
- unsigned_long => int
- version => string

JSON
POST /your_table_name/_mapping -d '
{
  "test": {
    "mappings": {
      "properties": {
        "price": {
            "type": "float"
        },
        "title": {
            "type": "text"
        }
      }
    }
  }
}
'
{
"total":0,
"error":"",
"warning":""
}

👍 What you can do with a real-time table:

⛔ What you cannot do with a real-time table:

Real-time table files structure

The following table outlines the different file extensions and their respective descriptions in a real-time table:

Extension Description
.lock A lock file that ensures that only one process can access the table at a time.
.ram The RAM chunk of the table, stored in memory and used as an accumulator of changes.
.meta The headers of the real-time table that define its structure and settings.
.*.sp* Disk chunks that are stored on disk with the same format as plain tables. They are created when the RAM chunk size exceeds the rt_mem_limit.

For more information on the structure of disk chunks, refer to the plain table files structure.

¶ 6.2.2

Plain table

Plain table is a basic element for non-percolate searching. It can be defined only in a configuration file using the Plain mode, and is not supported in the RT mode. It is typically used in conjunction with a source to process data from the external storage and can later be attached to a real-time table.

Creating a plain table

To create a plain table, you'll need to define it in a configuration file. It's not supported by the CREATE TABLE command.

Here's an example of a plain table configuration and a source for fetching data from a MySQL database:

How to create a plain table

Plain table example
source source {
  type             = mysql
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query        = SELECT id, title, description, category_id  from mytable
  sql_attr_uint    = category_id
  sql_field_string = title
 }

table tbl {
  type   = plain
  source = source
  path   = /path/to/table
 }

👍 What you can do with a plain table:

⛔ What you cannot do with a plain table:

Numeric attributes, including MVAs, are the only elements that can be updated in a plain table. All other data in the table is immutable. If updates or new records are required, the table must be rebuilt. During the rebuilding process, the existing table remains available to serve requests, and a process called rotation is performed when the new version is ready, bringing it online and discarding the old version.

Plain table building performance

The speed at which a plain table is indexed depends on several factors, including:

Plain table building scenarios

Rebuild fully when needed

For small data sets, the simplest option is to have a single plain table that is fully rebuilt as needed. This approach is acceptable when:

Main+delta scenario

For larger data sets, a plain table can be used instead of a Real-Time. The main+delta scenario involves:

This approach allows for infrequent rebuilding of the larger table and more frequent processing of updates from the source. The smaller table can be rebuilt more often (e.g. every minute or even every few seconds).

However, as time goes on, the indexing duration for the smaller table will become too long, requiring a rebuild of the larger table and the emptying of the smaller one.

The main+delta schema is explained in detail in this interactive course.

The mechanism of kill list and killlist_target directive is used to ensure that documents from the current table take precedence over those from the other table.

For more information on this topic, see here.

Plain table files structure

The following table outlines the various file extensions used in a plain table and their respective descriptions:

Extension Description
.spa stores document attributes in row-wise mode
.spb stores blob attributes in row-wise mode: strings, MVA, json
.spc stores document attributes in columnar mode
.spd stores matching document ID lists for each word ID
.sph stores table header information
.sphi stores histograms of attribute values
.spi stores word lists (word IDs and pointers to .spd file)
.spidx stores secondary indexes data
.spk stores kill-lists
.spl lock file
.spm stores a bitmap of killed documents
.spp stores hit (aka posting, aka word occurrence) lists for each word ID
.spt stores additional data structures to speed up lookups by document ids
.spe stores skip-lists to speed up doc-list filtering
.spds stores document texts
.tmp* temporary files during index_settings_and_status
.new.sp* new version of a plain table before rotation
.old.sp* old version of a plain table after rotation
¶ 6.2.3

Plain and real-time table settings

Defining table schema in a configuration file

table <index_name>[:<parent table name>] {
...
}
Plain
table <table name> {
  type = plain
  path = /path/to/table
  source = <source_name>
  source = <another source_name>
  [stored_fields = <comma separated list of full-text fields that should be stored, all are stored by default, can be empty>]
}
Real-time
table <table name> {
  type = rt
  path = /path/to/table

  rt_field = <full-text field name>
  rt_field = <another full-text field name>
  [rt_attr_uint = <integer field name>]
  [rt_attr_uint = <another integer field name, limit by N bits>:N]
  [rt_attr_bigint = <bigint field name>]
  [rt_attr_bigint = <another bigint field name>]
  [rt_attr_multi = <multi-integer (MVA) field name>]
  [rt_attr_multi = <another multi-integer (MVA) field name>]
  [rt_attr_multi_64 = <multi-bigint (MVA) field name>]
  [rt_attr_multi_64 = <another multi-bigint (MVA) field name>]
  [rt_attr_float = <float field name>]
  [rt_attr_float = <another float field name>]
  [rt_attr_float_vector = <float vector field name>]
  [rt_attr_float_vector = <another float vector field name>]
  [rt_attr_bool = <boolean field name>]
  [rt_attr_bool = <another boolean field name>]
  [rt_attr_string = <string field name>]
  [rt_attr_string = <another string field name>]
  [rt_attr_json = <json field name>]
  [rt_attr_json = <another json field name>]
  [rt_attr_timestamp = <timestamp field name>]
  [rt_attr_timestamp = <another timestamp field name>]

  [stored_fields = <comma separated list of full-text fields that should be stored, all are stored by default, can be empty>]

  [rt_mem_limit = <RAM chunk max size, default 128M>]
  [optimize_cutoff = <max number of RT table disk chunks>]

}

Common plain and real-time tables settings

type

type = plain

type = rt

Table type: "plain" or "rt" (real-time)

Value: plain (default), rt

path

path = path/to/table

The path to where the table will be stored or located, either absolute or relative, without the extension.

Value: The path to the table, mandatory

stored_fields

stored_fields = title, content

By default, the original content of full-text fields is indexed and stored when a table is defined in a configuration file. This setting allows you to specify the fields that should have their original values stored.

Value: A comma-separated list of full-text fields that should be stored. An empty value (i.e. stored_fields = ) disables the storage of original values for all fields.

Note: In the case of a real-time table, the fields listed in stored_fields should also be declared as rt_field.

Also, note that you don't need to list attributes in stored_fields, since their original values are stored anyway. stored_fields can only be used for full-text fields.

See also docstore_block_size, docstore_compression for document storage compression options.

SQL
CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)
JSON
POST /cli -d "
CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)"
PHP
$params = [
    'body' => [
        'columns' => [
            'title'=>['type'=>'text'],
            'content'=>['type'=>'text', 'options' => ['indexed', 'stored']],
            'name'=>['type'=>'text', 'options' => ['indexed']],
            'price'=>['type'=>'float']
        ]
    ],
    'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);
Python
utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)')
Javascript
res = await utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)');
Java
utilsApi.sql("CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)");
C#
utilsApi.Sql("CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)");
CONFIG
table products {
  stored_fields = title, content # we want to store only "title" and "content", "name" shouldn't be stored

  type = rt
  path = tbl
  rt_field = title
  rt_field = content
  rt_field = name
  rt_attr_uint = price
}

stored_only_fields

stored_only_fields = title,content

List of fields that will be stored in the table, but not indexed. This setting works similarly to stored_fields except that when a field is specified in stored_only_fields t will only be stored, not indexed, and cannot be searched using full-text queries. It can only be retrieved in search results.

The value is a comma-separated list of fields that should be stored only, not indexed. By default, this value is empty. If a real-time table is being defined, the fields listed in stored_only_fields must also be declared as rt_field.

Note also, that you don't need to list attributes instored_only_fields,since their original values are stored anyway. If to compare stored_only_fields to string attributes the former (stored field):

In contrast, the latter (string attribute):

Real-time table settings:

optimize_cutoff

The maximum number of disk chunks for the RT table. Learn more here.

rt_field

rt_field = subject

This field declaration determines the full-text fields that will be indexed. The field names must be unique, and the order is preserved. When inserting data, the field values must be in the same order as specified in the configuration.

This is a multi-value, optional field.

rt_attr_uint

rt_attr_uint = gid

This declaration defines an unsigned integer attribute.

Value: the field name or field_name:N (where N is the maximum number of bits to keep).

rt_attr_bigint

rt_attr_bigint = gid

This declaration defines a BIGINT attribute.

Value: field name, multiple records allowed.

rt_attr_multi

rt_attr_multi = tags

Declares a multi-valued attribute (MVA) with unsigned 32-bit integer values.

Value: field name. Multiple records allowed.

rt_attr_multi_64

rt_attr_multi_64 = wide_tags

Declares a multi-valued attribute (MVA) with signed 64-bit BIGINT values.

Value: field name. Multiple records allowed.

rt_attr_float

rt_attr_float = lat
rt_attr_float = lon

Declares floating point attributes with single precision, 32-bit IEEE 754 format.

Value: field name. Multiple records allowed.

rt_attr_float_vector

rt_attr_float_vector = image_vector

Declares a vector of floating-point values.

Value: field name. Multiple records allowed.

rt_attr_bool

rt_attr_bool = available

Declares a boolean attribute with 1-bit unsigned integer values.

Value: field name.

rt_attr_string

rt_attr_string = title

String attribute declaration.

Value: field name.

rt_attr_json

rt_attr_json = properties

Declares a JSON attribute.

Value: field name.

rt_attr_timestamp

rt_attr_timestamp = date_added

Declares a timestamp attribute.

Value: field name.

rt_mem_limit

rt_mem_limit = 512M

Memory limit for a RAM chunk of the table. Optional, default is 128M.

RT tables store some data in memory, known as the "RAM chunk," and also maintain a number of on-disk tables, referred to as "disk chunks." This directive allows you to control the size of the RAM chunk. When there is too much data to keep in memory, RT tables will flush it to disk, activate a newly created disk chunk, and reset the RAM chunk.

Please note that the limit is strict, and RT tables will never allocate more memory than what is specified in the rt_mem_limit. Additionally, memory is not preallocated, so specifying a 512MB limit and only inserting 3MB of data will result in allocating only 3MB, not 512MB.

The rt_mem_limit is never exceeded, but the actual RAM chunk size can be significantly lower than the limit. RT tables adapt to the data insertion pace and adjust the actual limit dynamically to minimize memory usage and maximize data write speed. This is how it works:

For instance, if 90MB of data is saved to a disk chunk and an additional 10MB of data arrives while the save is in progress, the rate would be 90%. Next time, the RT table will collect up to 90% of rt_mem_limit before flushing the data. The faster the insertion pace, the lower the rt_mem_limit rate. The rate varies between 33.3% to 95%. You can view the current rate of a table using the SHOW TABLE STATUS command.

How to change rt_mem_limit and optimize_cutoff

In real-time mode, you can adjust the size limit of RAM chunks and the maximum number of disk chunks using the ALTER TABLE statement. To set rt_mem_limit to 1 gigabyte for the table "t," run the following query: ALTER TABLE t rt_mem_limit='1G'. To change the maximum number of disk chunks, run the query: ALTER TABLE t optimize_cutoff='5'.

In the plain mode, you can change the values of rt_mem_limit and optimize_cutoff by updating the table configuration or running the command ALTER TABLE <index_name> RECONFIGURE

Important notes about RAM chunks

Plain table settings:

source

source = srcpart1
source = srcpart2
source = srcpart3

The source field specifies the source from which documents will be obtained during indexing of the current table. There must be at least one source. The sources can be of different types (e.g. one could be MySQL, another PostgreSQL). For more information on indexing from external storages, indexing from external storages here

Value: The name of the source is mandatory. Multiple values are allowed.

killlist_target

killlist_target = main:kl

This setting determines the table(s) to which the kill-list will be applied. Matches in the targeted table that are updated or deleted in the current table will be suppressed. In :kl mode, the documents to suppress are taken from the kill-list. In :id mode, all document IDs from the current table are suppressed in the targeted one. If neither is specified, both modes will take effect. Learn more about kill-lists here

Value: not specified (default), target_index_name:kl, target_index_name:id, target_index_name. Multiple values are allowed

columnar_attrs

columnar_attrs = *
columnar_attrs = id, attr1, attr2, attr3

This configuration setting determines which attributes should be stored in the columnar storage instead of the row-wise storage.

You can set columnar_attrs = * to store all supported data types in the columnar storage.

Additionally, id is a supported attribute to store in the columnar storage.

columnar_strings_no_hash

columnar_strings_no_hash = attr1, attr2, attr3

By default, all string attributes stored in columnar storage store pre-calculated hashes. These hashes are used for grouping and filtering. However, they occupy extra space, and if you don't need to group by that attribute, you can save space by disabling hash generation.

Creating a real-time table online via CREATE TABLE

CREATE TABLE [IF NOT EXISTS] name ( <field name> <field data type> [data type options] [, ...]) [table_options]

For more information on data types, see more about data types here.

Type Equivalent in a configuration file Notes Aliases
text rt_field Options: indexed, stored. Default: both. To keep text stored, but indexed, specify "stored" only. To keep text indexed only, specify "indexed" only. string
integer rt_attr_uint integer int, uint
bigint rt_attr_bigint big integer
float rt_attr_float float
float_vector rt_attr_float_vector a vector of float values
multi rt_attr_multi multi-integer
multi64 rt_attr_multi_64 multi-bigint
bool rt_attr_bool boolean
json rt_attr_json JSON
string rt_attr_string string. Option indexed, attribute will make the value full-text indexed and filterable, sortable and groupable at the same time
timestamp rt_attr_timestamp timestamp
bit(n) rt_attr_uint field_name:N N is the max number of bits to keep
SQL
CREATE TABLE products (title text, price float) morphology='stem_en'
This creates the "products" table with two fields: "title" (full-text) and "price" (float), and sets the "morphology" to "stem_en".
CREATE TABLE products (title text indexed, description text stored, author text, price float)
This creates the "products" table with three fields: * "title" is indexed, but not stored. * "description" is stored, but not indexed. * "author" is both stored and indexed.

Engine

create table ... engine='columnar';
create table ... engine='rowwise';

The engine setting changes the default attribute storage for all attributes in the table. You can also specify engine separately for each attribute.

For information on how to enable columnar storage for a plain table, see columnar_attrs .

Values:

Other settings

The following settings are applicable for both real-time and plain tables, regardless of whether they are specified in a configuration file or set online using the CREATE or ALTER command.

Accessing table files

Manticore supports two access modes for reading table data: seek+read and mmap.

In seek+read mode, the server uses the pread system call to read document lists and keyword positions, represented by the*.spd and *.spp files. The server uses internal read buffers to optimize the reading process, and the size of these buffers can be adjusted using the options read_buffer_docs and read_buffer_hits.There is also the option preopen that controls how Manticore opens files at start.

In mmap access mode, the search server maps the table's file into memory using the mmap system call, and the OS caches the file contents. The options read_buffer_docs and read_buffer_hits have no effect for corresponding files in this mode. The mmap reader can also lock the table's data in memory using themlock privileged call, which prevents the OS from swapping the cached data out to disk.

To control which access mode to use, the options access_plain_attrs, access_blob_attrs, access_doclists, access_hitlists and access_dict are available, with the following values:

Value Description
file server reads the table files from disk with seek+read using internal buffers on file access
mmap server maps the table files into memory and OS caches up its contents on file access
mmap_preread server maps the table files into memory and a background thread reads it once to warm up the cache
mlock server maps the table files into memory and then executes the mlock() system call to cache up the file contents and lock it into memory to prevent it being swapped out
Setting Values Description
access_plain_attrs mmap, mmap_preread (default), mlock controls how *.spa (plain attributes) *.spe (skip lists) *.spt (lookups) *.spm (killed docs) will be read
access_blob_attrs mmap, mmap_preread (default), mlock controls how *.spb (blob attributes) (string, mva and json attributes) will be read
access_doclists file (default), mmap, mlock controls how *.spd (doc lists) data will be read
access_hitlists file (default), mmap, mlock controls how *.spp (hit lists) data will be read
access_dict mmap, mmap_preread (default), mlock controls how *.spi (dictionary) will be read

Here is a table which can help you select your desired mode:

table part keep it on disk keep it in memory cached in memory on server start lock it in memory
plain attributes in row-wise (non-columnar) storage, skip lists, word lists, lookups, killed docs mmap mmap mmap_preread (default) mlock
row-wise string, multi-value attributes (MVA) and json attributes mmap mmap mmap_preread (default) mlock
columnar numeric, string and multi-value attributes always only by means of OS no not supported
doc lists file (default) mmap no mlock
hit lists file (default) mmap no mlock
dictionary mmap mmap mmap_preread (default) mlock
The recommendations are:

The default mode offers a balance of:

This provides a decent search performance, optimal memory utilization, and faster searchd restart in most scenarios.

attr_update_reserve

attr_update_reserve = 256k

This setting reserves extra space for updates to blob attributes such as multi-value attributes (MVA), strings, and JSON. The default value is 128k. When updating these attributes, their length may change. If the updated string is shorter than the previous one, it will overwrite the old data in the *.spb file. If the updated string is longer, it will be written to the end of the *.spb file. This file is memory-mapped, making resizing it a potentially slow process, depending on the operating system's memory-mapped file implementation. To avoid frequent resizing, you can use this setting to reserve extra space at the end of the .spb file.

Value: size, default 128k.

docstore_block_size

docstore_block_size = 32k

This setting controls the size of blocks used by the document storage. The default value is 16kb. When original document text is stored using stored_fields or stored_only_fields, it is stored within the table and compressed for efficiency. To optimize disk access and compression ratios for small documents, these documents are concatenated into blocks. The indexing process collects documents until their total size reaches the threshold specified by this option. At that point, the block of documents is compressed. This option can be adjusted to achieve better compression ratios (by increasing the block size) or faster access to document text (by decreasing the block size).

Value: size, default 16k.

docstore_compression

docstore_compression = lz4hc

This setting determines the type of compression used for compressing blocks of documents stored in document storage. If stored_fields or stored_only_fields are specified, the document storage stores compressed document blocks. 'lz4' offers fast compression and decompression speeds, while 'lz4hc' (high compression) sacrifices some compression speed for a better compression ratio. 'none' disables compression completely.

Values: lz4 (default), lz4hc, none.

docstore_compression_level

docstore_compression_level = 12

The compression level used when 'lz4hc' compression is applied in document storage. By adjusting the compression level, you can find the right balance between performance and compression ratio when using 'lz4hc' compression. Note that this option is not applicable when using 'lz4' compression.

Value: An integer between 1 and 12, with a default of 9.

preopen

preopen = 1

This setting indicates that searchd should open all table files on startup or rotation, and keep them open while running. By default, the files are not pre-opened. Pre-opened tables require a few file descriptors per table, but they eliminate the need for per-query open() calls and are immune to race conditions that might occur during table rotation under high load. However, if you are serving many tables, it may still be more efficient to open them on a per-query basis in order to conserve file descriptors.

Value: 0 (default), or 1.

read_buffer_docs

read_buffer_docs = 1M

Buffer size for storing the list of documents per keyword. Increasing this value will result in higher memory usage during query execution, but may reduce I/O time.

Value: size, default 256k, minimum value is 8k.

read_buffer_hits

read_buffer_hits = 1M

Buffer size for storing the list of hits per keyword. Increasing this value will result in higher memory usage during query execution, but may reduce I/O time.

Value: size, default 256k, minimum value is 8k.

Plain table disk footprint settings

inplace_enable

inplace_enable = {0|1}

Enables in-place table inversion. Optional, default is 0 (uses separate temporary files).

The inplace_enable option reduces the disk footprint during indexing of plain tables, while slightly slowing down indexing (it uses approximately 2 times less disk, but yields around 90-95% of the original performance).

Indexing is comprised of two primary phases. During the first phase, documents are collected, processed, and partially sorted by keyword, and the intermediate results are written to temporary files (.tmp*). During the second phase, the documents are fully sorted and the final table files are created. Rebuilding a production table on-the-fly requires approximately 3 times the peak disk footprint: first for the intermediate temporary files, second for the newly constructed copy, and third for the old table that will be serving production queries in the meantime. (Intermediate data is comparable in size to the final table.) This may be too much disk footprint for large data collections, and the inplace_enable option can be used to reduce it. When enabled, it reuses the temporary files, outputs the final data back to them, and renames them upon completion. However, this may require additional temporary data chunk relocation, which is where the performance impact comes from.

This directive has no effect on searchd, it only affects the indexer.

CONFIG
table products {
  inplace_enable = 1

  path = products
  source = src_base
}

inplace_hit_gap

inplace_hit_gap = size

The option In-place inversion fine-tuning option. Controls preallocated hitlist gap size. Optional, default is 0.

This directive only affects the searchd tool, and does not have any impact on the indexer.

CONFIG
table products {
  inplace_hit_gap = 1M
  inplace_enable = 1

  path = products
  source = src_base
}

inplace_reloc_factor

inplace_reloc_factor = 0.1

The inplace_reloc_factor setting determines the size of the relocation buffer within the memory arena used during indexing. The default value is 0.1.

This option is optional and only affects the indexer tool, not the searchd server.

CONFIG
table products {
  inplace_reloc_factor = 0.1
  inplace_enable = 1

  path = products
  source = src_base
}

inplace_write_factor

inplace_write_factor = 0.1

Controls the size of the buffer used for in-place writing during indexing. Optional, with a default value of 0.1.

It's important to note that this directive only impacts the indexer tool and not the searchd server.

CONFIG
table products {
  inplace_write_factor = 0.1
  inplace_enable = 1

  path = products
  source = src_base
}

Natural language processing specific settings

The following settings are supported. They are all described in section NLP and tokenization.

¶ 6.2.4

Percolate table

A percolate table is a special table that stores queries rather than documents. It is used for prospective searches, or "search in reverse."

The schema of a percolate table is fixed and contains the following fields:

Field Description
ID An unsigned 64-bit integer with auto-increment functionality. It can be omitted when adding a PQ rule, as described in add a PQ rule
Query Full-text query of the rule, which can be thought of as the value of MATCH clause or JSON /search. If per field operators are used inside the query, the full-text fields need to be declared in the percolate table configuration. If the stored query is only for attribute filtering (without full-text querying), the query value can be empty or omitted. The value of this field should correspond to the expected document schema, which is specified when creating the percolate table.
Filters Optional. Filters are an optional string containing attribute filters and/or expressions, defined the same way as in the WHERE clause or JSON filtering. The value of this field should correspond to the expected document schema, which is specified when creating the percolate table.
Tags Optional. Tags represent a list of string labels separated by commas that can be used for filtering/deleting PQ rules. The tags can also be returned along with matching documents when performing a Percolate query

Note that you do not need to add the above fields when creating a percolate table.

What you need to keep in mind when creating a new percolate table is to specify the expected schema of a document, which will be checked against the rules you will add later. This is done in the same way as for any other local table.

SQL
CREATE TABLE products(title text, meta json) type='pq';
Query OK, 0 rows affected (0.00 sec)
JSON
POST /cli -d "CREATE TABLE products(title text, meta json) type='pq'"
{
"total":0,
"error":"",
"warning":""
}
PHP
$index = [
    'index' => 'products',
    'body' => [
        'columns' => [
            'title' => ['type' => 'text'],
            'meta' => ['type' => 'json']
        ],
        'settings' => [
            'type' => 'pq'
        ]
    ]
];
$client->indices()->create($index);
Array(
    [total] => 0
    [error] =>
    [warning] =>
)
Python
utilsApi.sql('CREATE TABLE products(title text, meta json) type=\'pq\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, meta json) type=\'pq\'');
java
utilsApi.sql("CREATE TABLE products(title text, meta json) type='pq'");
C#
utilsApi.Sql("CREATE TABLE products(title text, meta json) type='pq'");
typescript
res = await utilsApi.sql("CREATE TABLE products(title text, meta json) type='pq'");
go
apiClient.UtilsAPI.Sql(context.Background()).Body("CREATE TABLE products(title text, meta json) type='pq'").Execute()
CONFIG
table products {
  type = percolate
  path = tbl_pq
  rt_field = title
  rt_attr_json = meta
}
¶ 6.2.5

Template table

A Template Table is a special type of table in Manticore that doesn't store any data and doesn't create any files on your disk. Despite this, it can have the same NLP settings as a plain or real-time table. Template tables can be used for the following purposes:

CONFIG
table template {
  type = template
  morphology = stem_en
  wordforms = wordforms.txt
  exceptions = exceptions.txt
  stopwords = stopwords.txt
}
¶ 6.3

NLP and tokenization

¶ 6.3.1

Data tokenization

Manticore doesn't store text as is for performing full-text searching on it. Instead, it extracts words and creates several structures that allow fast full-text searching. From the found words, a dictionary is built, which allows a quick look to discover if the word is present or not in the index. In addition, other structures record the documents and fields in which the word was found (as well as the position of it inside a field). All these are used when a full-text match is performed.

The process of demarcating and classifying words is called tokenization. The tokenization is applied at both indexing and searching, and it operates at the character and word level.

On the character level, the engine allows only certain characters to pass. This is defined by the charset_table. Anything else is replaced with a whitespace (which is considered the default word separator). The charset_table also allows mappings, such as lowercasing or simply replacing one character with another. Besides that, characters can be ignored, blended, defined as a phrase boundary.

At the word level, the base setting is the min_word_len which defines the minimum word length in characters to be accepted in the index. A common request is to match singular with plural forms of words. For this, morphology processors can be used.

Going further, we might want a word to be matched as another one because they are synonyms. For this, the word forms feature can be used, which allows one or more words to be mapped to another one.

Very common words can have some unwanted effects on searching, mostly because of their frequency they require lots of computing to process their doc/hit lists. They can be blacklisted with the stop words functionality. This helps not only in speeding up queries but also in decreasing the index size.

A more advanced blacklisting is bigrams, which allows creating a special token between a "bigram" (common) word and an uncommon word. This can speed up several times when common words are used in phrase searches.

In case of indexing HTML content, it's important not to index the HTML tags, as they can introduce a lot of "noise" in the index. HTML stripping can be used and can be configured to strip, but index certain tag attributes or completely ignore the content of certain HTML elements.

¶ 6.3.2

Supported languages

Manticore supports a wide range of languages, with basic support enabled for most languages via charset_table = non_cjk (which is the default value).

For many languages, Manticore provides a stopwords file that can be used to improve search relevance.

Additionally, advanced morphology is available for a few languages that can significantly improve search relevance by using dictionary-based lemmatization or stemming algorithms for better segmentation and normalization.

The table below lists all supported languages and indicates how to enable:

Language Supported Stopwords file name Advanced morphology Notes
Afrikaans charset_table=non_cjk af -
Arabic charset_table=non_cjk ar morphology=stem_ar (Arabic stemmer); morphology=libstemmer_ar
Armenian charset_table=non_cjk hy -
Assamese specify charset_table specify charset_table manually - -
Basque charset_table=non_cjk eu -
Bengali charset_table=non_cjk bn -
Bishnupriya specify charset_table manually - -
Buhid specify charset_table manually - -
Bulgarian charset_table=non_cjk bg -
Catalan charset_table=non_cjk ca morphology=libstemmer_ca
Chinese charset_table=chinese or ngram_chars=chinese zh morphology=icu_chinese or ngram_chars=1 correspondingly ICU dictionary based segmentation is much more accurate than ngram-based
Croatian charset_table=non_cjk hr -
Kurdish charset_table=non_cjk ckb -
Czech charset_table=non_cjk cz morphology=stem_cz (Czech stemmer)
Danish charset_table=non_cjk da morphology=libstemmer_da
Dutch charset_table=non_cjk nl morphology=libstemmer_nl
English charset_table=non_cjk en morphology=lemmatize_en (single root form); morphology=lemmatize_en_all (all root forms); morphology=stem_en (Porter's English stemmer); morphology=stem_enru (Porter's English and Russian stemmers); morphology=libstemmer_en (English from libstemmer)
Esperanto charset_table=non_cjk eo -
Estonian charset_table=non_cjk et -
Finnish charset_table=non_cjk fi morphology=libstemmer_fi
French charset_table=non_cjk fr morphology=libstemmer_fr
Galician charset_table=non_cjk gl -
Garo specify charset_table manually - -
German charset_table=non_cjk de morphology=lemmatize_de (single root form); morphology=lemmatize_de_all (all root forms); morphology=libstemmer_de
Greek charset_table=non_cjk el morphology=libstemmer_el
Hebrew charset_table=non_cjk he -
Hindi charset_table=non_cjk hi morphology=libstemmer_hi
Hmong specify charset_table manually - -
Ho specify charset_table manually - -
Hungarian charset_table=non_cjk hu morphology=libstemmer_hu
Indonesian charset_table=non_cjk id morphology=libstemmer_id
Irish charset_table=non_cjk ga morphology=libstemmer_ga
Italian charset_table=non_cjk it morphology=libstemmer_it
Japanese ngram_chars=japanese - ngram_chars=japanese ngram_len=1 Requires ngram-based segmentation
Komi specify charset_table manually - -
Korean ngram_chars=korean - ngram_chars=korean ngram_len=1 Requires ngram-based segmentation
Large Flowery Miao specify charset_table manually - -
Latin charset_table=non_cjk la -
Latvian charset_table=non_cjk lv -
Lithuanian charset_table=non_cjk lt morphology=libstemmer_lt
Maba specify charset_table manually - -
Maithili specify charset_table manually - -
Marathi specify charset_table manually - -
Marathi charset_table=non_cjk mr -
Mende specify charset_table manually - -
Mru specify charset_table manually - -
Myene specify charset_table manually - -
Nepali specify charset_table manually - morphology=libstemmer_ne
Ngambay specify charset_table manually - -
Norwegian charset_table=non_cjk no morphology=libstemmer_no
Odia specify charset_table manually - -
Persian charset_table=non_cjk fa -
Polish charset_table=non_cjk pl -
Portuguese charset_table=non_cjk pt morphology=libstemmer_pt
Romanian charset_table=non_cjk ro morphology=libstemmer_ro
Russian charset_table=non_cjk ru morphology=lemmatize_ru (single root form); morphology=lemmatize_ru_all (all root forms); morphology=stem_ru (Porter's Russian stemmer); morphology=stem_enru (Porter's English and Russian stemmers); morphology=libstemmer_ru (from libstemmer)
Santali specify charset_table manually - -
Sindhi specify charset_table manually - -
Slovak charset_table=non_cjk sk -
Slovenian charset_table=non_cjk sl -
Somali charset_table=non_cjk so -
Sotho charset_table=non_cjk st -
Spanish charset_table=non_cjk es morphology=libstemmer_es
Swahili charset_table=non_cjk sw -
Swedish charset_table=non_cjk sv morphology=libstemmer_sv
Sylheti specify charset_table manually - -
Tamil specify charset_table manually - morphology=libstemmer_ta
Thai charset_table=non_cjk th -
Turkish charset_table=non_cjk tr morphology=libstemmer_tr
Ukrainian charset_table=non_cjk,U+0406->U+0456,U+0456,U+0407->U+0457,U+0457,U+0490->U+0491,U+0491 - morphology=lemmatize_uk_all Requires installation of UK lemmatizer
Yoruba charset_table=non_cjk yo -
Zulu charset_table=non_cjk zu -
¶ 6.3.3

Chinese, Japanese and Korean (CJK) languages

Manticore provides built-in support for indexing CJK texts, allowing you to process CJK texts in two different ways:

  1. Precise segmentation using the ICU library. Currently, only Chinese is supported.
SQL
CREATE TABLE products(title text, price float) charset_table = 'cjk' morphology = 'icu_chinese'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) charset_table = 'cjk' morphology = 'icu_chinese'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'charset_table' => 'cjk',
            'morphology' => 'icu_chinese'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'cjk\' morphology = \'icu_chinese\'')
Javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'cjk\' morphology = \'icu_chinese\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) charset_table = 'cjk' morphology = 'icu_chinese'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) charset_table = 'cjk' morphology = 'icu_chinese'");
CONFIG
table products {
  charset_table = cjk
  morphology = icu_chinese

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
  1. Basic support using the N-gram options ngram_len and ngram_chars
    For each CJK language, there are separate character set tables (chinese, korean, japanese) that can be used, or you can use the common cjk character set table.
SQL
CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'charset_table' => 'non_cjk',
             'ngram_len' => '1',
             'ngram_chars' => 'cjk'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'non_cjk\' ngram_len = \'1\' ngram_chars = \'cjk\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'non_cjk\' ngram_len = \'1\' ngram_chars = \'cjk\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'");
CONFIG
table products {
  charset_table = non_cjk
  ngram_len = 1
  ngram_chars = cjk

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

Additionally, there is built-in support for Chinese stopwords with the alias zh.

SQL
CREATE TABLE products(title text, price float) charset_table = 'chinese' morphology = 'icu_chinese' stopwords = 'zh'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) charset_table = 'chinese' morphology = 'icu_chinese' stopwords = 'zh'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'charset_table' => 'chinese',
            'morphology' => 'icu_chinese',
            'stopwords' => 'zh'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'chinese\' morphology = \'icu_chinese\' stopwords = \'zh\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'chinese\' morphology = \'icu_chinese\' stopwords = \'zh\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) charset_table = 'chinese' morphology = 'icu_chinese' stopwords = 'zh'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) charset_table = 'chinese' morphology = 'icu_chinese' stopwords = 'zh'");
CONFIG
table products {
  charset_table = chinese
  morphology = icu_chinese
  stopwords = zh

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
¶ 6.3.4

Low-level tokenization

When text is indexed in Manticore, it is split into words and case folding is done so that words like "Abc", "ABC", and "abc" are treated as the same word.

To perform these operations correctly, Manticore must know:

You can configure these settings on a per-table basis using the charset_table option. charset_table specifies an array that maps letter characters to their case-folded versions (or any other characters that you prefer). Characters that are not present in the array are considered to be non-letters and will be treated as word separators during indexing or searching in this table.

The default character set is non_cjk, which includes most languages.

You can also define text pattern replacement rules. For example, with the following rules:

regexp_filter = \**(\d+)\" => \1 inch
regexp_filter = (BLUE|RED) => COLOR

The text RED TUBE 5" LONG would be indexed as COLOR TUBE 5 INCH LONG, and PLANK 2" x 4" would be indexed as PLANK 2 INCH x 4 INCH. These rules are applied in the specified order. The rules also apply to queries, so a search for BLUE TUBE would actually search for COLOR TUBE.

You can learn more about regexp_filter here.

Index configuration options

charset_table

# default
charset_table = non_cjk

# only English and Russian letters
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451

# english charset defined with alias
charset_table = 0..9, english, _

# you can override character mappings by redefining them, e.g. for case insensitive search with German umlauts you can use:
charset_table = non_cjk, U+00E4, U+00C4->U+00E4, U+00F6, U+00D6->U+00F6, U+00FC, U+00DC->U+00FC, U+00DF, U+1E9E->U+00DF

charset_table specifies an array that maps letter characters to their case folded versions (or any other characters if you like). The default character set is non_cjk which includes most non-CJK languages.

charset_table is a workhorse of Manticore's tokenization process, which extracts keywords from document text or query text. It controls what characters are accepted as valid and how they should be transformed (e.g. whether case should be removed or not).

By default, every character maps to 0, which means that it is not considered a valid keyword and is treated as a separator. Once a character is mentioned in the table, it is mapped to another character (most frequently, either to itself or to a lowercase letter) and is treated as a valid keyword part.

charset_table uses a comma-separated list of mappings to declare characters as valid or to map them to other characters. Syntax shortcuts are available for mapping ranges of characters at once:

For characters with codes from 0 to 32, and those in the range of 127 to 8-bit ASCII and Unicode characters, Manticore always treats them as separators. To avoid configuration file encoding issues, 8-bit ASCII characters and Unicode characters must be specified in U+XXX form, where XXX is a hexadecimal code point number. The minimal accepted Unicode character code is U+0021.

If the default mappings are insufficient for your needs, you can redefine the character mappings by specifying them again with another mapping. For example, if the built-in non_cjk array includes characters Ä and ä and maps them both to the ASCII character a, you can redefine those characters by adding the Unicode code points for them, like this:

charset_table = non_cjk,U+00E4,U+00C4

for case sensitive search or

charset_table = non_cjk,U+00E4,U+00C4->U+00E4

for case insensitive search.

SQL
CREATE TABLE products(title text, price float) charset_table = '0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) charset_table = '0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'charset_table' => '0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) charset_table = '0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) charset_table = '0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451'");
CONFIG
table products {
  charset_table = 0..9, A..Z->a..z, _, a..z, \
    U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

Besides definitions of characters and mappings, there are several built-in aliases that can be used. Current aliases are:

SQL
CREATE TABLE products(title text, price float) charset_table = '0..9, english, _'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) charset_table = '0..9, english, _'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'charset_table' => '0..9, english, _'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'0..9, english, _\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'0..9, english, _\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) charset_table = '0..9, english, _'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) charset_table = '0..9, english, _'");
CONFIG
table products {
  charset_table = 0..9, english, _

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

If you want to support different languages in your search, it can be a laborious task to define sets of valid characters and folding rules for all of them. We have simplified this for you by providing default charset tables, non_cjk and cjk, that cover non-CJK and CJK (Chinese, Japanese, Korean) languages respectively. In most cases, these charsets should be sufficient for your needs.

Please note that the following languages are currently not supported:

All other languages listed in the Unicode languages
list
are supported by default.

To work with both cjk and non-cjk languages, set the options in your configuration file as shown below (with an exception for Chinese):

SQL
CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'charset_table' => 'non_cjk',
             'ngram_len' => '1',
             'ngram_chars' => 'cjk'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'non_cjk\' ngram_len = \'1\' ngram_chars = \'cjk\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) charset_table = \'non_cjk\' ngram_len = \'1\' ngram_chars = \'cjk\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) charset_table = 'non_cjk' ngram_len = '1' ngram_chars = 'cjk'");
CONFIG
table products {
  charset_table       = non_cjk
  ngram_len           = 1
  ngram_chars         = cjk

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

If you do not require support for cjk-languages, you can simply exclude the ngram_len and ngram_chars
options. For more information on these options, refer to the corresponding documentation sections.

To map one character to multiple characters or vice versa, you can use regexp_filter can be helpful.

blend_chars

blend_chars = +, &, U+23
blend_chars = +, &->+

Blended characters list. Optional, default is empty.

Blended characters are indexed as both separators and valid characters. For example, when & is defined as a blended character and AT&T appears in an indexed document, three different keywords will be indexed, at&t, at and t.

Additionally, blended characters can influence indexing in such a way that keywords are indexed as if the blended characters were not typed at all. This behavior is particularly evident when blend_mode = trim_all is specified. For example, the phrase some_thing will be indexed as some, something, and thing with blend_mode = trim_all.

Care should be taken when using blended characters as defining a character as blended means that it is no longer a separator.

Positions for tokens obtained by replacing blended characters with whitespace are assigned as usual, and regular keywords will be indexed as if there were no blend_chars specified at all. An additional token that mixes blended and non-blended characters will be put at the starting position. For instance, if AT&T company occurs in the very beginning of the text field, at will be given position 1, t position 2, company position 3, and AT&T will also be given position 1, blending with the opening regular keyword. As a result, queries for AT&T or just AT will match that document. A phrase query for "AT T" will also match, as well as a phrase query for "AT&T company".

Blended characters can overlap with special characters used in query syntax, such as T-Mobile or @twitter. Where possible, the query parser will handle the blended character as blended. For instance, if hello @twitter is within quotes (a phrase operator), the query parser will handle the @ symbol as blended. However, if the @ symbol was not within quotes, the character would be handled as an operator. Therefore, it is recommended to escape keywords.

Blended characters can be remapped so that multiple different blended characters can be normalized into one base form. This is useful when indexing multiple alternative Unicode codepoints with equivalent glyphs.

SQL
CREATE TABLE products(title text, price float) blend_chars = '+, &, U+23, @->_'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) blend_chars = '+, &, U+23, @->_'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'blend_chars' => '+, &, U+23, @->_'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) blend_chars = \'+, &, U+23, @->_\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) blend_chars = \'+, &, U+23, @->_\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) blend_chars = '+, &, U+23, @->_'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) blend_chars = '+, &, U+23, @->_'");
CONFIG
table products {
  blend_chars = +, &, U+23, @->_

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

blend_mode

blend_mode = option [, option [, ...]]
option = trim_none | trim_head | trim_tail | trim_both | trim_all | skip_pure

The blended tokens indexing mode is enabled by the blend_mode directive.

By default, tokens that mix blended and non-blended characters get indexed entirely. For example, when both an at-sign and an exclamation are in blend_chars, the string @dude! will be indexed as two tokens: @dude! (with all the blended characters) and dude (without any). As a result, a query of @dude will not match it.

blend_mode adds flexibility to this indexing behavior. It takes a comma-separated list of options, each of which specifies a token indexing variant.

If multiple options are specified, multiple variants of the same token will be indexed. Regular keywords (resulting from that token by replacing blended characters with a separator) are always indexed.

The options are:

Using blend_mode with the example @dude! string above, the setting blend_mode = trim_head, trim_tail would result in two indexed tokens: @dude and dude!. Using trim_both would have no effect because trimming both blended characters results in dude, which is already indexed as a regular keyword. Indexing @U.S.A. with trim_both (and assuming that dot is blended two) would result in U.S.A being indexed. Lastly, skip_pure enables you to ignore sequences of blended characters only. For example, one @@@ two would be indexed as one two, and match that as a phrase. This is not the case by default because a fully blended token gets indexed and offsets the second keyword position.

Default behavior is to index the entire token, equivalent to blend_mode = trim_none.

Be aware that using blend modes limits your search, even with the default mode trim_none if you assume . is a blended character:

Using more modes increases the chance your keyword will match something.

SQL
CREATE TABLE products(title text, price float) blend_mode = 'trim_tail, skip_pure' blend_chars = '+, &'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) blend_mode = 'trim_tail, skip_pure' blend_chars = '+, &'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'blend_mode' => 'trim_tail, skip_pure',
            'blend_chars' => '+, &'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) blend_mode = \'trim_tail, skip_pure\' blend_chars = \'+, &\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) blend_mode = \'trim_tail, skip_pure\' blend_chars = \'+, &\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) blend_mode = 'trim_tail, skip_pure' blend_chars = '+, &'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) blend_mode = 'trim_tail, skip_pure' blend_chars = '+, &'");
CONFIG
table products {
  blend_mode = trim_tail, skip_pure
  blend_chars = +, &

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

min_word_len

min_word_len = length

min_word_len is an optional index configuration option in Manticore that specifies the minimum indexed word length. The default value is 1, which means that everything is indexed.

Only those words that are not shorter than this minimum will be indexed. For example, if min_word_len is 4, then 'the' won't be indexed, but 'they' will be.

SQL
CREATE TABLE products(title text, price float) min_word_len = '4'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) min_word_len = '4'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'min_word_len' => '4'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) min_word_len = \'4\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) min_word_len = \'4\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) min_word_len = '4'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) min_word_len = '4'");
CONFIG
table products {
  min_word_len = 4

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

ngram_len

ngram_len = 1

N-gram lengths for N-gram indexing. Optional, default is 0 (disable n-gram indexing). Known values are 0 and 1.

N-grams provide basic CJK (Chinese, Japanese, Korean) support for unsegmented texts. The issue with CJK searching is that there may be no clear separators between the words. In some cases, you may not want to use dictionary-based segmentation the one available for Chinese. In those cases, n-gram segmentation might work well too.

When this feature is enabled, streams of CJK (or any other characters defined in ngram_chars) are indexed as N-grams. For example, if the incoming text is "ABCDEF" (where A to F represent some CJK characters) and ngram_len is 1, it will be indexed as if it were "A B C D E F". Only ngram_len=1 is currently supported. Only those characters that are listed in ngram_chars table will be split this way; others will not be affected.

Note that if the search query is segmented, i.e. there are separators between individual words, then wrapping the words in quotes and using extended mode will result in proper matches being found even if the text was not segmented. For instance, assume that the original query is BC DEF. After wrapping in quotes on the application side, it should look like "BC" "DEF" (with quotes). This query will be passed to Manticore and internally split into 1-grams too, resulting in "B C" "D E F" query, still with quotes that are the phrase matching operator. And it will match the text even though there were no separators in the text.

Even if the search query is not segmented, Manticore should still produce good results, thanks to phrase-based ranking: it will pull closer phrase matches (which in the case of N-gram CJK words can mean closer multi-character word matches) to the top.

SQL
CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'ngram_chars' => 'cjk',
             'ngram_len' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) ngram_chars = \'cjk\' ngram_len = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) ngram_chars = \'cjk\' ngram_len = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'");
CONFIG
table products {
  ngram_chars = cjk
  ngram_len = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

ngram_chars

ngram_chars = cjk

ngram_chars = cjk, U+3000..U+2FA1F

N-gram characters list. Optional, default is empty.

To be used in conjunction with in ngram_len, this list defines characters, sequences of which are subject to N-gram extraction. Words comprised of other characters will not be affected by N-gram indexing feature. The value format is identical to charset_table. N-gram characters cannot appear in the charset_table.

SQL
CREATE TABLE products(title text, price float) ngram_chars = 'U+3000..U+2FA1F' ngram_len = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) ngram_chars = 'U+3000..U+2FA1F' ngram_len = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'ngram_chars' => 'U+3000..U+2FA1F',
             'ngram_len' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) ngram_chars = \'U+3000..U+2FA1F\' ngram_len = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) ngram_chars = \'U+3000..U+2FA1F\' ngram_len = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) ngram_chars = 'U+3000..U+2FA1F' ngram_len = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) ngram_chars = 'U+3000..U+2FA1F' ngram_len = '1'");
CONFIG
table products {
  ngram_chars = U+3000..U+2FA1F
  ngram_len = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

Also you can use an alias for our default N-gram table as in the example. It should be sufficient in most cases.

SQL
CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'ngram_chars' => 'cjk',
             'ngram_len' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) ngram_chars = \'cjk\' ngram_len = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) ngram_chars = \'cjk\' ngram_len = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) ngram_chars = 'cjk' ngram_len = '1'");
CONFIG
table products {
  ngram_chars = cjk
  ngram_len = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

ignore_chars

ignore_chars = U+AD

Ignored characters list. Optional, default is empty.

Useful in cases when some characters, such as soft hyphenation mark (U+00AD), should be not just treated as separators but rather fully ignored. For example, if '-' is simply not in the charset_table, "abc-def" text will be indexed as "abc" and "def" keywords. On the contrary, if '-' is added to ignore_chars list, the same text will be indexed as a single "abcdef" keyword.

The syntax is the same as for charset_table, but it's only allowed to declare characters, and not allowed to map them. Also, the ignored characters must not be present in charset_table.

SQL
CREATE TABLE products(title text, price float) ignore_chars = 'U+AD'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) ignore_chars = 'U+AD'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'ignore_chars' => 'U+AD'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) ignore_chars = \'U+AD\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) ignore_chars = \'U+AD\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) ignore_chars = 'U+AD'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) ignore_chars = 'U+AD'");
CONFIG
table products {
  ignore_chars = U+AD

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

bigram_index

bigram_index = {none|all|first_freq|both_freq}

Bigram indexing mode. Optional, default is none.

Bigram indexing is a feature to accelerate phrase searches. When indexing, it stores a document list for either all or some of the adjacent words pairs into the index. Such a list can then be used at searching time to significantly accelerate phrase or sub-phrase matching.

bigram_index controls the selection of specific word pairs. The known modes are:

For most use cases, both_freq would be the best mode, but your mileage may vary.

SQL
CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'both_freq'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'both_freq'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'bigram_freq_words' => 'the, a, you, i',
            'bigram_index' => 'both_freq'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) bigram_freq_words = \'the, a, you, i\' bigram_index = \'both_freq\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) bigram_freq_words = \'the, a, you, i\' bigram_index = \'both_freq\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'both_freq'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'both_freq'");
CONFIG
table products {
  bigram_index = both_freq
  bigram_freq_words = the, a, you, i

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

bigram_freq_words

bigram_freq_words = the, a, you, i

A list of keywords considered "frequent" when indexing bigrams. Optional, default is empty.

Some of the bigram indexing modes (see bigram_index) require to define a list of frequent keywords. These are not to be confused with stop words. Stop words are completely eliminated when both indexing and searching. Frequent keywords are only used by bigrams to determine whether to index a current word pair or not.

bigram_freq_words lets you define a list of such keywords.

SQL
CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'first_freq'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'first_freq'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'bigram_freq_words' => 'the, a, you, i',
            'bigram_index' => 'first_freq'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) bigram_freq_words = \'the, a, you, i\' bigram_index = \'first_freq\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) bigram_freq_words = \'the, a, you, i\' bigram_index = \'first_freq\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'first_freq'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) bigram_freq_words = 'the, a, you, i' bigram_index = 'first_freq'");
CONFIG
table products {
  bigram_freq_words = the, a, you, i
  bigram_index = first_freq

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

dict

dict = {keywords|crc}

The type of keywords dictionary used is identified by one of two known values, 'crc' or 'keywords'. This is optional, with 'keywords' as the default.

Using the keywords dictionary mode (dict=keywords) can significantly decrease the indexing burden and enable substring searches on extensive collections. This mode can be utilized for both plain and RT tables.

CRC dictionaries do not store the original keyword text in the index. Instead, they replace keywords with a control sum value (computed using FNV64) during both searching and indexing processes. This value is used internally within the index. This approach has two disadvantages:

The keywords dictionary resolves both of these issues. It stores keywords in the index and performs search-time wildcard expansion. For instance, a search for a test* prefix could internally expand to a 'test|tests|testing' query based on the dictionary's contents. This expansion process is entirely invisible to the application, with the exception that the separate per-keyword statistics for all the matched keywords are now also reported.

For substring (infix) searches, extended wildcards can be used. Special characters such as ? and % are compatible with substring (infix) search (e.g., t?st*, run%, *abc*). Note that the wildcards operators and the REGEX only function with dict=keywords.

Indexing with a keywords dictionary is approximately 1.1x to 1.3x slower than regular, non-substring indexing - yet significantly faster than substring indexing (either prefix or infix). The index size should only be slightly larger than that of the standard non-substring table, with a total difference of 1..10% percent. The time it takes for regular keyword searching should be nearly the same or identical across all three index types discussed (CRC non-substring, CRC substring, keywords). Substring searching time can significantly fluctuate based on how many actual keywords match the given substring (i.e., how many keywords the search term expands into). The maximum number of matched keywords is limited by the expansion_limit directive.

In summary, keywords and CRC dictionaries offer two different trade-off decisions for substring searching. You can opt to either sacrifice indexing time and index size to achieve the fastest worst-case searches (CRC dictionary), or minimally impact indexing time but sacrifice worst-case searching time when the prefix expands into a high number of keywords (keywords dictionary).

SQL
CREATE TABLE products(title text, price float) dict = 'keywords'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) dict = 'keywords'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'dict' => 'keywords'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) dict = \'keywords\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) dict = \'keywords\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) dict = 'keywords'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) dict = 'keywords'");
CONFIG
table products {
  dict = keywords

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

embedded_limit

embedded_limit = size

Embedded exceptions, wordforms, or stop words file size limit. Optional, default is 16K.

When you create a table the above mentioned files can be either saved externally along with the table or embedded directly into the table. Files sized under embedded_limit get stored into the table. For bigger files, only the file names are stored. This also simplifies moving table files to a different machine; you may get by just copying a single file.

With smaller files, such embedding reduces the number of the external files on which the table depends, and helps maintenance. But at the same time it makes no sense to embed a 100 MB wordforms dictionary into a tiny delta table. So there needs to be a size threshold, and embedded_limit is that threshold.

CONFIG
table products {
  embedded_limit = 32K

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

global_idf

global_idf = /path/to/global.idf

The path to a file with global (cluster-wide) keyword IDFs. Optional, default is empty (use local IDFs).

On a multi-table cluster, per-keyword frequencies are quite likely to differ across different tables. That means that when the ranking function uses TF-IDF based values, such as BM25 family of factors, the results might be ranked slightly differently depending on what cluster node they reside.

The easiest way to fix that issue is to create and utilize a global frequency dictionary, or a global IDF file for short. This directive lets you specify the location of that file. It is suggested (but not required) to use an .idf extension. When the IDF file is specified for a given table and OPTION global_idf is set to 1, the engine will use the keyword frequencies and collection documents counts from the global_idf file, rather than just the local table. That way, IDFs and the values that depend on them will stay consistent across the cluster.

IDF files can be shared across multiple tables. Only a single copy of an IDF file will be loaded by searchd, even when many tables refer to that file. Should the contents of an IDF file change, the new contents can be loaded with a SIGHUP.

You can build an .idf file using indextool utility, by dumping dictionaries using --dumpdict dict.txt --stats switch first, then converting those to .idf format using --buildidf, then merging all the .idf files across cluster using --mergeidf.

SQL
CREATE TABLE products(title text, price float) global_idf = '/usr/local/manticore/var/global.idf'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) global_idf = '/usr/local/manticore/var/global.idf'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'global_idf' => '/usr/local/manticore/var/global.idf'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) global_idf = \'/usr/local/manticore/var/global.idf\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) global_idf = \'/usr/local/manticore/var/global.idf\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) global_idf = '/usr/local/manticore/var/global.idf'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) global_idf = '/usr/local/manticore/var/global.idf'");
CONFIG
table products {
  global_idf = /usr/local/manticore/var/global.idf

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

hitless_words

hitless_words = {all|path/to/file}

Hitless words list. Optional, allowed values are 'all', or a list file name.

By default, Manticore full-text index stores not only a list of matching documents for every given keyword, but also a list of its in-document positions (known as hitlist). Hitlists enables phrase, proximity, strict order and other advanced types of searching, as well as phrase proximity ranking. However, hitlists for specific frequent keywords (that can not be stopped for some reason despite being frequent) can get huge and thus slow to process while querying. Also, in some cases we might only care about boolean keyword matching, and never need position-based searching operators (such as phrase matching) nor phrase ranking.

hitless_words lets you create indexes that either do not have positional information (hitlists) at all, or skip it for specific keywords.

Hitless index will generally use less space than the respective regular full-text index (about 1.5x can be expected). Both indexing and searching should be faster, at a cost of missing positional query and ranking support.

If used in positional queries (e.g. phrase queries) the hitless words are taken out from them and used as operand without a position. For example if "hello" and "world" are hitless and "simon" and "says" are not hitless, the phrase query "simon says hello world" will be converted to ("simon says" & hello & world), matching "hello" and "world" anywhere in the document and "simon says" as an exact phrase.

A positional query than contains only hitless words will result in an empty phrase node, therefore the entire query will return an empty result and a warning. If the whole dictionary is hitless (using all) only boolean matching can be used on the respective index.

SQL
CREATE TABLE products(title text, price float) hitless_words = 'all'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) hitless_words = 'all'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'hitless_words' => 'all'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) hitless_words = \'all\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) hitless_words = \'all\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) hitless_words = 'all'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) hitless_words = 'all'");
CONFIG
table products {
  hitless_words = all

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

index_field_lengths

index_field_lengths = {0|1}

Enables computing and storing of field lengths (both per-document and average per-index values) into the full-text index. Optional, default is 0 (do not compute and store).

When index_field_lengths is set to 1 Manticore will:

BM25A() and BM25F() functions in the expression ranker are based on these lengths and require index_field_lengths to be enabled. Historically, Manticore used a simplified, stripped-down variant of BM25 that, unlike the complete function, did not account for document length. There's also support for both a complete variant of BM25, and its extension towards multiple fields, called BM25F. They require per-document length and per-field lengths, respectively. Hence the additional directive.

SQL
CREATE TABLE products(title text, price float) index_field_lengths = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) index_field_lengths = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'index_field_lengths' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) index_field_lengths = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) index_field_lengths = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) index_field_lengths = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) index_field_lengths = '1'");
CONFIG
table products {
  index_field_lengths = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

index_token_filter

index_token_filter = my_lib.so:custom_blend:chars=@#&

Index-time token filter for full-text indexing. Optional, default is empty.

The index_token_filter directive specifies an optional index-time token filter for full-text indexing. This directive is used to create a custom tokenizer that makes tokens according to custom rules. The filter is created by the indexer on indexing source data into a plain table or by an RT table on processing INSERT or REPLACE statements. The plugins are defined using the format library name:plugin name:optional string of settings. For example, my_lib.so:custom_blend:chars=@#&.

SQL
CREATE TABLE products(title text, price float) index_token_filter = 'my_lib.so:custom_blend:chars=@#&'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) index_token_filter = 'my_lib.so:custom_blend:chars=@#&'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'index_token_filter' => 'my_lib.so:custom_blend:chars=@#&'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) index_token_filter = \'my_lib.so:custom_blend:chars=@#&\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) index_token_filter = \'my_lib.so:custom_blend:chars=@#&\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) index_token_filter = 'my_lib.so:custom_blend:chars=@#&'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) index_token_filter = 'my_lib.so:custom_blend:chars=@#&'");
CONFIG
table products {
  index_token_filter = my_lib.so:custom_blend:chars=@#&

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

overshort_step

overshort_step = {0|1}

Position increment on overshort (less than min_word_len) keywords. Optional, allowed values are 0 and 1, default is 1.

SQL
CREATE TABLE products(title text, price float) overshort_step = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) overshort_step = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'overshort_step' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) overshort_step = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) overshort_step = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) overshort_step = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) overshort_step = '1'");
CONFIG
table products {
  overshort_step = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

phrase_boundary

phrase_boundary = ., ?, !, U+2026 # horizontal ellipsis

Phrase boundary characters list. Optional, default is empty.

This list controls what characters will be treated as phrase boundaries, in order to adjust word positions and enable phrase-level search emulation through proximity search. The syntax is similar to charset_table, but mappings are not allowed and the boundary characters must not overlap with anything else.

On phrase boundary, additional word position increment (specified by phrase_boundary_step) will be added to current word position. This enables phrase-level searching through proximity queries: words in different phrases will be guaranteed to be more than phrase_boundary_step distance away from each other; so proximity search within that distance will be equivalent to phrase-level search.

Phrase boundary condition will be raised if and only if such character is followed by a separator; this is to avoid abbreviations such as S.T.A.L.K.E.R or URLs being treated as several phrases.

SQL
CREATE TABLE products(title text, price float) phrase_boundary = '., ?, !, U+2026' phrase_boundary_step = '10'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) phrase_boundary = '., ?, !, U+2026' phrase_boundary_step = '10'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'phrase_boundary' => '., ?, !, U+2026',
             'phrase_boundary_step' => '10'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) phrase_boundary = \'., ?, !, U+2026\' phrase_boundary_step = \'10\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) phrase_boundary = \'., ?, !, U+2026\' phrase_boundary_step = \'10\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) phrase_boundary = '., ?, !, U+2026' phrase_boundary_step = '10'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) phrase_boundary = '., ?, !, U+2026' phrase_boundary_step = '10'");
CONFIG
table products {
  phrase_boundary = ., ?, !, U+2026
  phrase_boundary_step = 10

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

phrase_boundary_step

phrase_boundary_step = 100

Phrase boundary word position increment. Optional, default is 0.

On phrase boundary, current word position will be additionally incremented by this number.

SQL
CREATE TABLE products(title text, price float) phrase_boundary_step = '100' phrase_boundary = '., ?, !, U+2026'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) phrase_boundary_step = '100' phrase_boundary = '., ?, !, U+2026'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'phrase_boundary_step' => '100',
             'phrase_boundary' => '., ?, !, U+2026'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) phrase_boundary_step = \'100\' phrase_boundary = \'., ?, !, U+2026\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) phrase_boundary_step = \'100\' phrase_boundary = \'., ?, !, U+2026\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) phrase_boundary_step = '100' phrase_boundary = '., ?, !, U+2026'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) phrase_boundary_step = '100' phrase_boundary = '., ?, !, U+2026'");
CONFIG
table products {
  phrase_boundary_step = 100
  phrase_boundary = ., ?, !, U+2026

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

regexp_filter

# index '13"' as '13inch'
regexp_filter = \b(\d+)\" => \1inch

# index 'blue' or 'red' as 'color'
regexp_filter = (blue|red) => color

Regular expressions (regexps) used to filter the fields and queries. This directive is optional, multi-valued, and its default is an empty list of regular expressions. The regular expressions engine used by Manticore Search is Google's RE2, which is known for its speed and safety. For detailed information on the syntax supported by RE2, you can visit the RE2 syntax guide.

In certain applications such as product search, there can be many ways to refer to a product, model, or property. For example, iPhone 3gs and iPhone 3 gs (or even iPhone3 gs) are very likely to refer to the same product. Another example could be different ways to express a laptop screen size, such as 13-inch, 13 inch, 13", or 13in.

Regexps provide a mechanism to specify rules tailored to handle such cases. In the first example, you could possibly use a wordforms file to handle a handful of iPhone models, but in the second example, it's better to specify rules that would normalize "13-inch" and "13in" to something identical.

Regular expressions listed in regexp_filter are applied in the order they are listed, at the earliest stage possible, before any other processing (including exceptions), even before tokenization. That is, regexps are applied to the raw source fields when indexing, and to the raw search query text when searching.

SQL
CREATE TABLE products(title text, price float) regexp_filter = '(blue|red) => color'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) regexp_filter = '(blue|red) => color'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'regexp_filter' => '(blue|red) => color'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) regexp_filter = \'(blue|red) => color\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) regexp_filter = \'(blue|red) => color\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) regexp_filter = '(blue|red) => color'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) regexp_filter = '(blue|red) => color'");
CONFIG
table products {
  # index '13"' as '13inch'
  regexp_filter = \b(\d+)\" => \1inch

  # index 'blue' or 'red' as 'color'
  regexp_filter = (blue|red) => color

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
¶ 6.3.5

Wildcard searching settings

Wildcard searching is a common text search type. In Manticore, it is performed at the dictionary level. By default, both plain and RT tables use a dictionary type called dict. In this mode, words are stored as they are, so enabling wildcarding does not affect the size of the table. When a wildcard search is performed, the dictionary is searched to find all possible expansions of the wildcarded word. This expansion can be problematic in terms of computation at query time when the expanded word provides many expansions or expansions that have huge hitlists, especially in the case of infixes where the wildcard is added at the start and end of the word. To avoid such problems, the expansion_limit can be used.

min_prefix_len

min_prefix_len = length

This setting determines the minimum word prefix length to index and search. By default, it is set to 0, meaning prefixes are not allowed.

Prefixes allow for wildcard searching by wordstart* wildcards.

For example, if the word "example" is indexed with min_prefix_len=3, it can be found by searching for "exa", "exam", "examp", "exampl", as well as the full word.

Note that with dict=crc min_prefix_len will affect the size of the full-text index since each word expansion will be stored additionally.

Manticore can differentiate perfect word matches from prefix matches and rank the former higher if the following conditions are met:

Note that with either dict=crc mode or any of the above options disabled, it is not possible to differentiate between prefixes and full words, and perfect word matches cannot be ranked higher.

When the minimum infix length is set to a positive number, the minimum prefix length is always considered 1.

SQL
CREATE TABLE products(title text, price float) min_prefix_len = '3'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) min_prefix_len = '3'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'min_prefix_len' => '3'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) min_prefix_len = \'3\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) min_prefix_len = \'3\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) min_prefix_len = '3'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) min_prefix_len = '3'");
CONFIG
table products {
  min_prefix_len = 3

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

min_infix_len

min_infix_len = length

The min_infix_len setting determines the minimum length of an infix prefix to index and search. It is optional and its default value is 0, which means that infixes are not allowed. The minimum allowed non-zero value is 2.

When enabled, infixes allow for wildcard searching with term patterns like start*, *end, *middle*, , and so on. It also allows you to disable too short wildcards if they are too expensive to search for.

If the following conditions are met, Manticore can differentiate perfect word matches from infix matches and rank the former higher:

Note that with the dict=crc mode or any of the above options disabled, there is no way to differentiate between infixes and full words, and thus perfect word matches cannot be ranked higher.

Infix wildcard search query time can vary greatly, depending on how many keywords the substring will actually expand to. Short and frequent syllables like *in* or *ti* might expand to way too many keywords, all of which would need to be matched and processed. Therefore, to generally enable substring searches, you would set min_infix_len to 2. To limit the impact from wildcard searches with too short wildcards, you might set it higher.

Infixes must be at least 2 characters long, and wildcards like *a* are not allowed for performance reasons.

When min_infix_len is set to a positive number, the minimum prefix length is considered 1. For dict word infixing and prefixing cannot be both enabled at the same time. For dict and other fields to have prefixes declared with prefix_fields, it is forbidden to declare the same field in both lists.

If dict=keywords, besides the wildcard * two other wildcard characters can be used:

SQL
CREATE TABLE products(title text, price float) min_infix_len = '3'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) min_infix_len = '3'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'min_infix_len' => '3'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) min_infix_len = \'3\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) min_infix_len = \'3\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) min_infix_len = '3'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) min_infix_len = '3'");
CONFIG
table products {
  min_infix_len = 3

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

prefix_fields

prefix_fields = field1[, field2, ...]

The prefix_fields setting is used to limit prefix indexing to specific full-text fields in dict=crc mode. By default, all fields are indexed in prefix mode, but because prefix indexing can affect both indexing and searching performance, it may be desired to limit it to certain fields.

To limit prefix indexing to specific fields, use the prefix_fields setting followed by a comma-separated list of field names. If prefix_fields is not set, then all fields will be indexed in prefix mode.

CONFIG
table products {
  prefix_fields = title, name
  min_prefix_len = 3
  dict = crc

infix_fields

infix_fields = field1[, field2, ...]

The infix_fields setting allows you to specify a list of full-text fields to limit infix indexing to. This applies to dict=crc only and is optional, with the default being to index all fields in infix mode.
This setting is similar to prefix_fields, but instead allows you to limit infix indexing to specific fields.

CONFIG
table products {
  infix_fields = title, name
  min_infix_len = 3
  dict = crc

max_substring_len

max_substring_len = length

The max_substring_len directive sets the maximum substring length to be indexed for either prefix or infix searches. This setting is optional, and its default value is 0 (which means that all possible substrings are indexed). It only applies to dict.

By default, substring indexing in dict indexes all possible substrings as separate keywords, which can result in an overly large full-text index. Therefore, the max_substring_len directive allows you to skip too-long substrings that will probably never be searched for.

For example, a test table of 10,000 blog posts takes up a different amount of disk space depending on the settings:

Therefore, limiting the max substring length can save 10-15% of the table size.

When using dict=keywords mode, there is no performance impact associated with substring length. Therefore, this directive is not applicable and is intentionally forbidden in that case. However, if required, you can still limit the length of a substring that you search for in the application code.

CONFIG
table products {
  max_substring_len = 12
  min_infix_len = 3
  dict = crc

expand_keywords

expand_keywords = {0|1|exact|star}

This setting expands keywords with their exact forms and/or with stars when possible. The supported values are:

Queries against tables with expand_keywords feature enabled are internally expanded as follows: if the table was built with prefix or infix indexing enabled, every keyword gets internally replaced with a disjunction of the keyword itself and a respective prefix or infix (keyword with stars). If the table was built with both stemming and index_exact_words enabled, exact form is also added.

SQL
CREATE TABLE products(title text, price float) expand_keywords = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) expand_keywords = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'expand_keywords' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) expand_keywords = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) expand_keywords = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) expand_keywords = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) expand_keywords = '1'");
CONFIG
table products {
  expand_keywords = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

Expanded queries take naturally longer to complete, but can possibly improve the search quality, as the documents with exact form matches should be ranked generally higher than documents with stemmed or infix matches.

Note that the existing query syntax does not allow to emulate this kind of expansion, because internal expansion works on keyword level and expands keywords within phrase or quorum operators too (which is not possible through the query syntax). Take a look at the examples and how expand_keywords affects the search result weights and how "runsy" is found by "runs" w/o the need to add a star:

expand_keywords_enabled
mysql> create table t(f text) min_infix_len='2' expand_keywords='1' morphology='stem_en';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> insert into t values(1,'running'),(2,'runs'),(3,'runsy');
Query OK, 3 rows affected (0.00 sec)

mysql> select *, weight() from t where match('runs');
+------+---------+----------+
| id   | f       | weight() |
+------+---------+----------+
|    2 | runs    |     1560 |
|    1 | running |     1500 |
|    3 | runsy   |     1500 |
+------+---------+----------+
3 rows in set (0.01 sec)

mysql> drop table t;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t(f text) min_infix_len='2' expand_keywords='exact' morphology='stem_en';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> insert into t values(1,'running'),(2,'runs'),(3,'runsy');
Query OK, 3 rows affected (0.00 sec)

mysql> select *, weight() from t where match('running');
+------+---------+----------+
| id   | f       | weight() |
+------+---------+----------+
|    1 | running |     1590 |
|    2 | runs    |     1500 |
+------+---------+----------+
2 rows in set (0.00 sec)
expand_keywords_disabled
mysql> create table t(f text) min_infix_len='2' morphology='stem_en';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> insert into t values(1,'running'),(2,'runs'),(3,'runsy');
Query OK, 3 rows affected (0.00 sec)

mysql> select *, weight() from t where match('runs');
+------+---------+----------+
| id   | f       | weight() |
+------+---------+----------+
|    1 | running |     1500 |
|    2 | runs    |     1500 |
+------+---------+----------+
2 rows in set (0.00 sec)

mysql> drop table t;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t(f text) min_infix_len='2' morphology='stem_en';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> insert into t values(1,'running'),(2,'runs'),(3,'runsy');
Query OK, 3 rows affected (0.00 sec)

mysql> select *, weight() from t where match('running');
+------+---------+----------+
| id   | f       | weight() |
+------+---------+----------+
|    1 | running |     1500 |
|    2 | runs    |     1500 |
+------+---------+----------+
2 rows in set (0.00 sec)

This directive does not affect indexer in any way, it only affects searchd.

expansion_limit

expansion_limit = number

Maximum number of expanded keywords for a single wildcard. Details are here.

¶ 6.3.6

Ignoring stop words

Stop words are words that are ignored during indexing and searching, typically due to their high frequency and low value to search results.

Manticore Search applies stemming to stop words by default, which can lead to undesired results, but this can be turned off using the stopwords_unstemmed.

Small stop word files are stored in the table header, and there is a limit to the size of files that can be embedded, as defined by the embedded_limit option.

Stop words are not indexed, but they do affect keyword positions. For example, if "the" is a stop word, and document 1 contains the phrase "in office" while document 2 contains the phrase "in the office," searching for "in office" as an exact phrase will only return the first document, even though "the" is skipped as a stop word in the second document. This behavior can be modified using the stopword_step directive.

stopwords

stopwords=path/to/stopwords/file[ path/to/another/file ...]

The stopwords setting is optional and by default empty. It allows you to specify the path to one or more stop word files, separated by spaces. All the files will be loaded. In the real-time mode, only absolute paths are allowed.

The stop word file format is simple plain text with UTF-8 encoding. The file data will be tokenized with respect to the charset_table settings, so you can use the same separators as in the indexed data.

Stop word files can be created manually or semi-automatically. The indexer provides a mode that creates a frequency dictionary of the table, sorted by the keyword frequency. Top keywords from that dictionary can usually be used as stop words. See --buildstops and --buildfreqs switch for details. Top keywords from that dictionary can usually be used as stop words.

SQL
CREATE TABLE products(title text, price float) stopwords = '/usr/local/manticore/data/stopwords.txt /usr/local/manticore/data/stopwords-ru.txt /usr/local/manticore/data/stopwords-en.txt'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) stopwords = '/usr/local/manticore/data/stopwords.txt stopwords-ru.txt stopwords-en.txt'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'stopwords' => '/usr/local/manticore/data/stopwords.txt stopwords-ru.txt stopwords-en.txt'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'/usr/local/manticore/data/stopwords.txt /usr/local/manticore/data/stopwords-ru.txt /usr/local/manticore/data/stopwords-en.txt\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'/usr/local/manticore/data/stopwords.txt /usr/local/manticore/data/stopwords-ru.txt /usr/local/manticore/data/stopwords-en.txt\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) stopwords = '/usr/local/manticore/data/stopwords.txt /usr/local/manticore/data/stopwords-ru.txt /usr/local/manticore/data/stopwords-en.txt'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) stopwords = '/usr/local/manticore/data/stopwords.txt /usr/local/manticore/data/stopwords-ru.txt /usr/local/manticore/data/stopwords-en.txt'");
CONFIG
table products {
  stopwords = /usr/local/manticore/data/stopwords.txt
  stopwords = stopwords-ru.txt stopwords-en.txt

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

Alternatively you can use one of the default stop word files that come with Manticore. Currently stop words for 50 languages are available. Here is the full list of aliases for them:

For example, to use stop words for Italian language just put the following line in your config file:

SQL
CREATE TABLE products(title text, price float) stopwords = 'it'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) stopwords = 'it'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'stopwords' => 'it'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'it\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'it\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) stopwords = 'it'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) stopwords = 'it'");
CONFIG
table products {
  stopwords = it

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

If you need to use stop words for multiple languages you should list all their aliases, separated with commas (RT mode) or spaces (plain mode):

SQL
CREATE TABLE products(title text, price float) stopwords = 'en, it, ru'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) stopwords = 'en, it, ru'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'stopwords' => 'en, it, ru'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'en, it, ru\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'en, it, ru\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) stopwords = 'en, it, ru'");
C#
utilsApi.sql("CREATE TABLE products(title text, price float) stopwords = 'en, it, ru'");
CONFIG
table products {
  stopwords = en it ru

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

stopword_step

stopword_step={0|1}

The position_increment setting on stopwords is optional, and the allowed values are 0 and 1, with the default being 1.

SQL
CREATE TABLE products(title text, price float) stopwords = 'en' stopword_step = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) stopwords = 'en' stopword_step = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'stopwords' => 'en, it, ru',
            'stopword_step' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'en\' stopword_step = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'en\' stopword_step = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) stopwords = \'en\' stopword_step = \'1\'");
C#
utilsApi.sql("CREATE TABLE products(title text, price float) stopwords = \'en\' stopword_step = \'1\'");
CONFIG
table products {
  stopwords = en
  stopword_step = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

stopwords_unstemmed

stopwords_unstemmed={0|1}

Whether to apply stop words before or after stemming. Optional, default is 0 (apply stop word filter after stemming).

By default, stop words are stemmed themselves, and then applied to tokens after stemming (or any other morphology processing). This means that a token is stopped when stem(token) is equal to stem(stopword). This default behavior can lead to unexpected results when a token is erroneously stemmed to a stopped root. For example, "Andes" might get stemmed to "and", so when "and" is a stopword, "Andes" is also skipped.

However, you can change this behavior by enabling the stopwords_unstemmed directive. When this is enabled, stop words are applied before stemming (and therefore to the original word forms), and the tokens are skipped when the token is equal to the stopword.

SQL
CREATE TABLE products(title text, price float) stopwords = 'en' stopwords_unstemmed = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) stopwords = 'en' stopwords_unstemmed = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'stopwords' => 'en, it, ru',
            'stopwords_unstemmed' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'en\' stopwords_unstemmed = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) stopwords = \'en\' stopwords_unstemmed = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) stopwords = \'en\' stopwords_unstemmed = \'1\'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) stopwords = \'en\' stopwords_unstemmed = \'1\'");
CONFIG
table products {
  stopwords = en
  stopwords_unstemmed = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
¶ 6.3.7

Word forms

Word forms are applied after tokenizing incoming text by charset_table rules. They essentially let you replace one word with another. Normally, that would be used to bring different word forms to a single normal form (e.g. to normalize all the variants such as "walks", "walked", "walking" to the normal form "walk"). It can also be used to implement stemming exceptions, because stemming is not applied to words found in the forms list.

wordforms

wordforms = path/to/wordforms.txt
wordforms = path/to/alternateforms.txt
wordforms = path/to/dict*.txt

Word forms dictionary. Optional, default is empty.

The dictionaries are used to normalize incoming words both during indexing and searching. Therefore, when it comes to a plain table, it's required to rotate the table in order to pick up changes in the word forms file.

Word forms support in Manticore is designed to handle large dictionaries well. They moderately affect indexing speed; for example, a dictionary with 1 million entries slows down indexing by about 1.5 times. Searching speed is not affected at all. The additional RAM impact is roughly equal to the dictionary file size, and dictionaries are shared across tables. For instance, if the very same 50 MB word forms file is specified for 10 different tables, the additional searchd RAM usage will be about 50 MB.

The dictionary file should be in a simple plain text format. Each line should contain source and destination word forms, in UTF-8 encoding, separated by a "greater than" sign. Rules from the charset_table will be applied when the file is loaded, so if you are using built-in charset_table options, it is typically case-insensitive, just like your other full-text indexed data. Here is a sample file contents:

walks > walk
walked > walk
walking > walk

There is a bundled utility called Spelldump that helps you create a dictionary file in a format that Manticore can read. The utility can read from source .dict and .aff dictionary files in the ispell or MySpell format, as bundled with OpenOffice.

You can map several source words to a single destination word. The process happens on tokens, not the source text, so differences in whitespace and markup are ignored.

You can use the => symbol instead of >. Comments (starting with #) are also allowed. Finally, if a line starts with a tilde (~), the wordform will be applied after morphology, instead of before (note that only a single source and destination word are supported in this case).

core 2 duo > c2d
e6600 > c2d
core 2duo => c2d # Some people write '2duo' together...
~run > walk # Along with stem_en morphology enabled replaces 'run', 'running', 'runs' (and any other words that stem to just 'run') to 'walk'

You can specify multiple destination tokens:

s02e02 > season 2 episode 2
s3 e3 > season 3 episode 3

You can specify multiple files, not just one. Masks can be used as a pattern, and all matching files will be processed in simple ascending order.

In the RT mode, only absolute paths are allowed.

If multi-byte codepages are used and file names include foreign characters, the resulting order may not be exactly alphabetic. If the same wordform definition is found in multiple files, the latter one is used and overrides previous definitions.

SQL
CREATE TABLE products(title text, price float) wordforms = '/var/lib/manticore/wordforms.txt' wordforms = '/var/lib/manticore/alternateforms.txt /var/lib/manticore/dict*.txt'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) wordforms = '/var/lib/manticore/wordforms.txt' wordforms = '/var/lib/manticore/alternateforms.txt' wordforms = '/var/lib/manticore/dict*.txt'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'wordforms' => [
                '/var/lib/manticore/wordforms.txt',
                '/var/lib/manticore/alternateforms.txt',
                '/var/lib/manticore/dict*.txt'
            ]
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) wordforms = \'/var/lib/manticore/wordforms.txt\' wordforms = \'/var/lib/manticore/alternateforms.txt\' wordforms = \'/var/lib/manticore/dict*.txt\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float)wordforms = \'/var/lib/manticore/wordforms.txt\' wordforms = \'/var/lib/manticore/alternateforms.txt\' wordforms = \'/var/lib/manticore/dict*.txt\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) wordforms = '/var/lib/manticore/wordforms.txt' wordforms = '/var/lib/manticore/alternateforms.txt' wordforms = '/var/lib/manticore/dict*.txt'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) wordforms = '/var/lib/manticore/wordforms.txt' wordforms = '/var/lib/manticore/alternateforms.txt' wordforms = '/var/lib/manticore/dict*.txt'");
CONFIG
table products {
  wordforms = /var/lib/manticore/wordforms.txt
  wordforms = /var/lib/manticore/alternateforms.txt
  wordforms = /var/lib/manticore/dict*.txt

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
¶ 6.3.8

Exceptions

Exceptions (also known as synonyms) allow mapping one or more tokens (including tokens with characters that would normally be excluded) to a single keyword. They are similar to wordforms in that they also perform mapping but have a number of important differences.

A short summary of the differences from wordforms is as follows:

Exceptions Word forms
Case sensitive Case insensitive
Can use special characters that are not in charset_table Fully obey charset_table
Underperform on huge dictionaries Designed to handle millions of entries

exceptions

exceptions = path/to/exceptions.txt

Tokenizing exceptions file. Optional, the default is empty.
In the RT mode, only absolute paths are allowed.

The expected file format is plain text, with one line per exception. The line format is as follows:

map-from-tokens => map-to-token

Example file:

at & t => at&t
AT&T => AT&T
Standarten   Fuehrer => standartenfuhrer
Standarten Fuhrer => standartenfuhrer
MS Windows => ms windows
Microsoft Windows => ms windows
C++ => cplusplus
c++ => cplusplus
C plus plus => cplusplus

All tokens here are case sensitive and will not be processed by charset_table rules. Thus, with the example exceptions file above, the "at&t" text will be tokenized as two keywords "at" and "t" due to lowercase letters. On the other hand, "AT&T" will match exactly and produce a single "AT&T" keyword.

Note that this map-to keyword:

In our sample, "ms windows" query will not match the document with "MS Windows" text. The query will be interpreted as a query for two keywords, "ms" and "windows". The mapping for "MS Windows" is a single keyword "ms windows", with a space in the middle. On the other hand, "standartenfuhrer" will retrieve documents with "Standarten Fuhrer" or "Standarten Fuehrer" contents (capitalized exactly like this), or any capitalization variant of the keyword itself, e.g., "staNdarTenfUhreR". (It won't catch "standarten fuhrer", however: this text does not match any of the listed exceptions because of case sensitivity and gets indexed as two separate keywords.)

The whitespace in the map-from tokens list matters, but its amount does not. Any amount of whitespace in the map-form list will match any other amount of whitespace in the indexed document or query. For instance, the "AT & T" map-from token will match "AT & T" text, whatever the amount of space in both map-from part and the indexed text. Such text will, therefore, be indexed as a special "AT&T" keyword, thanks to the very first entry from the sample.

Exceptions also allow capturing special characters (that are exceptions from general charset_table rules; hence the name). Assume that you generally do not want to treat '+' as a valid character, but still want to be able to search for some exceptions from this rule such as 'C++'. The sample above will do just that, totally independent of what characters are in the table and what are not.

Therefore, when it comes to a plain table, it's required to rotate the table in order to pick up changes in the exceptions file.

SQL
CREATE TABLE products(title text, price float) exceptions = '/usr/local/manticore/data/exceptions.txt'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) exceptions = '/usr/local/manticore/data/exceptions.txt'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'exceptions' => '/usr/local/manticore/data/exceptions.txt'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) exceptions = \'/usr/local/manticore/data/exceptions.txt\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) exceptions = \'/usr/local/manticore/data/exceptions.txt\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) exceptions = '/usr/local/manticore/data/exceptions.txt'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) exceptions = '/usr/local/manticore/data/exceptions.txt'");
CONFIG
table products {
  exceptions = /usr/local/manticore/data/exceptions.txt

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
¶ 6.3.9

Advanced morphology

Morphology preprocessors can be applied to words during indexing to normalize different forms of the same word and improve segmentation. For example, an English stemmer can normalize "dogs" and "dog" to "dog", resulting in identical search results for both keywords.

Manticore has four built-in morphology preprocessors:

morphology

morphology = morphology1[, morphology2, ...]

The morphology directive specifies a list of morphology preprocessors to apply to the words being indexed. This is an optional setting, with the default being no preprocessor applied.

Manticore comes with built-in morphological preprocessors for:

Lemmatizers require dictionary .pak files that can be downloaded from the Manticore website. The dictionaries need to be put in the directory specified by lemmatizer_base. Additionally, the lemmatizer_cache setting can be used to speed up lemmatizing by spending more RAM for an uncompressed dictionary cache.

The Chinese language segmentation can be performed using ICU. It provides more precise segmentation compared to n-grams but is slightly slower. The charset_table must include all Chinese characters, which can be done by using the "cjk" alias. When "morphology=icu_chinese" is specified, the documents are first pre-processed by ICU. Then, the result is processed by the tokenizer according to the charset_table, and finally, other morphology processors specified in the "morphology" option are applied. Only those parts of texts that contain Chinese are passed to ICU for segmentation, while others can be modified by different means such as different morphologies or charset_table.

Built-in English and Russian stemmers are faster than their libstemmer counterparts but may produce slightly different results

Soundex implementation matches that of MySQL. Metaphone implementation is based on Double Metaphone algorithm and indexes the primary code.

To use the morphology option, specify one or multiple of the built-in options, including:

Multiple stemmers can be specified, separated by commas. They will be applied to incoming words in the order they are listed, and the processing will stop once one of the stemmers modifies the word. Additionally, when wordforms feature is enabled, the word will be looked up in the word forms dictionary first. If there is a matching entry in the dictionary, stemmers will not be applied at all. wordforms сan be used to implement stemming exceptions.

SQL
CREATE TABLE products(title text, price float) morphology = 'stem_en, libstemmer_sv'
JSON
POST /cli -d "CREATE TABLE products(title text, price float)  morphology = 'stem_en, libstemmer_sv'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'morphology' => 'stem_en, libstemmer_sv'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) morphology = \'stem_en, libstemmer_sv\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) morphology = \'stem_en, libstemmer_sv\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) morphology = 'stem_en, libstemmer_sv'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) morphology = 'stem_en, libstemmer_sv'");
CONFIG
table products {
  morphology = stem_en, libstemmer_sv

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

morphology_skip_fields

morphology_skip_fields = field1[, field2, ...]

A list of fields to skip morphology preprocessing. Optional, default is empty (apply preprocessors to all fields).

SQL
CREATE TABLE products(title text, name text, price float) morphology_skip_fields = 'name' morphology = 'stem_en'
JSON
POST /cli -d "
CREATE TABLE products(title text, name text, price float) morphology_skip_fields = 'name' morphology = 'stem_en'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'morphology_skip_fields' => 'name',
            'morphology' => 'stem_en'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) morphology_skip_fields = \'name\' morphology = \'stem_en\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) morphology_skip_fields = \'name\' morphology = \'stem_en\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) morphology_skip_fields = 'name' morphology = 'stem_en'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) morphology_skip_fields = 'name' morphology = 'stem_en'");
CONFIG
table products {
  morphology_skip_fields = name
  morphology = stem_en

  type = rt
  path = tbl
  rt_field = title
  rt_field = name
  rt_attr_uint = price
}

min_stemming_len

min_stemming_len = length

Minimum word length at which to enable stemming. Optional, default is 1 (stem everything).

Stemmers are not perfect, and might sometimes produce undesired results. For instance, running "gps" keyword through Porter stemmer for English results in "gp", which is not really the intent. min_stemming_len feature lets you suppress stemming based on the source word length, ie. to avoid stemming too short words. Keywords that are shorter than the given threshold will not be stemmed. Note that keywords that are exactly as long as specified will be stemmed. So in order to avoid stemming 3-character keywords, you should specify 4 for the value. For more finely grained control, refer to wordforms feature.

SQL
CREATE TABLE products(title text, price float) min_stemming_len = '4' morphology = 'stem_en'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) min_stemming_len = '4' morphology = 'stem_en'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'min_stemming_len' => '4',
             'morphology' => 'stem_en'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) min_stemming_len = \'4\' morphology = \'stem_en\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) min_stemming_len = \'4\' morphology = \'stem_en\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) min_stemming_len = '4' morphology = 'stem_en'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) min_stemming_len = '4' morphology = 'stem_en'");
CONFIG
table products {
  min_stemming_len = 4
  morphology = stem_en

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

index_exact_words

index_exact_words = {0|1}

This option allows for the indexing of original keywords along with their morphologically modified versions. However, original keywords that are remapped by the wordforms and exceptions cannot be indexed. The default value is 0, indicating that this feature is disabled by default.

This allows the use of the exact form operator in the query language. Enabling this feature will increase the full-text index size and indexing time, but will not impact search performance.

SQL
CREATE TABLE products(title text, price float) index_exact_words = '1' morphology = 'stem_en'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) index_exact_words = '1' morphology = 'stem_en'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
             'index_exact_words' => '1',
             'morphology' => 'stem_en'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) index_exact_words = \'1\' morphology = \'stem_en\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) index_exact_words = \'1\' morphology = \'stem_en\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) index_exact_words = '1' morphology = 'stem_en'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) index_exact_words = '1' morphology = 'stem_en'");
CONFIG
table products {
  index_exact_words = 1
  morphology = stem_en

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
¶ 6.3.10

Advanced HTML tokenization

Stripping HTML tags

html_strip

html_strip = {0|1}

This option determines whether HTML markup should be stripped from the incoming full-text data. The default value is 0, which disables stripping. To enable stripping, set the value to 1.

HTML tags and entities are considered as markup and will be processed.

HTML tags are removed, while the contents between them (e.g. everything between <p> and </p>) are left intact. You can choose to keep and index tag attributes (e.g. HREF attribute in an A tag or ALT in an IMG tag). Some well-known inline tags, such as A, B, I, S, U, BASEFONT, BIG, EM, FONT, IMG, LABEL, SMALL, SPAN, STRIKE, STRONG, SUB, SUP, and TT, are completely removed. All other tags are treated as block level and are replaced with whitespace. For example, the text te<b>st</b> will be indexed as a single keyword 'test', while te<p>st</p> will be indexed as two keywords 'te' and 'st'.

HTML entities are decoded and replaced with their corresponding UTF-8 characters. The stripper supports both numeric forms (e.g. &#239;) and text forms (e.g. &oacute; or &nbsp;) of entities, and supports all entities specified by the HTML4 standard.

The stripper is designed to work with properly formed HTML and XHTML, but may produce unexpected results on malformed input (such as HTML with stray <'s or unclosed >'s).

Please note that only the tags themselves, as well as HTML comments, are stripped. To strip the contents of the tags, including embedded scripts, see the html_remove_elements option. There are no restrictions on tag names, meaning that everything that looks like a valid tag start, end, or comment will be stripped.

SQL
CREATE TABLE products(title text, price float) html_strip = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) html_strip = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'html_strip' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) html_strip = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) html_strip = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) html_strip = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) html_strip = '1'");
CONFIG
table products {
  html_strip = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

html_index_attrs

html_index_attrs = img=alt,title; a=title;

The html_index_attrs option allows you to specify which HTML markup attributes should be indexed even though other HTML markup is stripped. The default value is empty, meaning no attributes will be indexed.
The format of the option is a per-tag enumeration of indexable attributes, as demonstrated in the example above. The contents of the specified attributes will be retained and indexed, providing a way to extract additional information from your full-text data.

SQL
CREATE TABLE products(title text, price float) html_index_attrs = 'img=alt,title; a=title;' html_strip = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) html_index_attrs = 'img=alt,title; a=title;' html_strip = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'html_index_attrs' => 'img=alt,title; a=title;',
            'html_strip' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) html_index_attrs = \'img=alt,title; a=title;\' html_strip = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) html_index_attrs = \'img=alt,title; a=title;\' html_strip = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) html_index_attrs = \'img=alt,title; a=title;\' html_strip = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) html_index_attrs = \'img=alt,title; a=title;\' html_strip = '1'");
CONFIG
table products {
  html_index_attrs = img=alt,title; a=title;
  html_strip = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

html_remove_elements

html_remove_elements = element1[, element2, ...]

A list of HTML elements whose contents, along with the elements themselves, will be stripped. Optional, the default is an empty string (do not strip contents of any elements).

This option allows you to remove the contents of elements, meaning everything between the opening and closing tags. It is useful for removing embedded scripts, CSS, etc. The short tag form for empty elements (e.g.
) is properly supported, and the text following such a tag will not be removed.

The value is a comma-separated list of element (tag) names, the contents of which should be removed. Tag names are case-insensitive.

SQL
CREATE TABLE products(title text, price float) html_remove_elements = 'style, script' html_strip = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) html_remove_elements = 'style, script' html_strip = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'html_remove_elements' => 'style, script',
            'html_strip' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) html_remove_elements = \'style, script\' html_strip = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) html_remove_elements = \'style, script\' html_strip = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) html_remove_elements = \'style, script\' html_strip = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) html_remove_elements = \'style, script\' html_strip = '1'");
CONFIG
table products {
  html_remove_elements = style, script
  html_strip = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

Extracting important parts from HTML

index_sp

index_sp = {0|1}

Controls detection and indexing of sentence and paragraph boundaries. Optional, default is 0 (no detection or indexing).

This directive enables the detection and indexing of sentence and paragraph boundaries, making it possible for the SENTENCE and PARAGRAPH operators to work. Sentence boundary detection is based on plain text analysis, and only requires setting index_sp = 1 to enable it. Paragraph detection, however, relies on HTML markup and occurs during the [HTML stripping process](#html_strip. As such, to index paragraph boundaries, both the index_sp directive and the html_strip directive must be set to 1.

The following rules are used to determine sentence boundaries:

Paragraph boundaries are detected at every block-level HTML tag, including: ADDRESS, BLOCKQUOTE, CAPTION, CENTER, DD, DIV, DL, DT, H1, H2, H3, H4, H5, LI, MENU, OL, P, PRE, TABLE, TBODY, TD, TFOOT, TH, THEAD, TR, and UL.

Both sentences and paragraphs increment the keyword position counter by 1.

SQL
CREATE TABLE products(title text, price float) index_sp = '1' html_strip = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) index_sp = '1' html_strip = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'index_sp' => '1',
            'html_strip' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) index_sp = \'1\' html_strip = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) index_sp = \'1\' html_strip = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) index_sp = \'1\' html_strip = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) index_sp = \'1\' html_strip = '1'");
CONFIG
table products {
  index_sp = 1
  html_strip = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

index_zones

index_zones = h*, th, title

A list of HTML/XML zones within a field to be indexed. The default is an empty string (no zones will be indexed).

A "zone" is defined as everything between an opening and a matching closing tag, and all spans sharing the same tag name are referred to as a "zone." For example, everything between <H1> and </H1> in a document field belongs to the H1 zone.

The index_zones directive enables zone indexing, but the HTML stripper must also be enabled (by setting html_strip = 1). The value of index_zones should be a comma-separated list of tag names and wildcards (ending with a star) to be indexed as zones.

Zones can be nested and overlap, as long as every opening tag has a matching tag. Zones can also be used for matching with the ZONE operator, as described in the extended_query_syntax.

SQL
CREATE TABLE products(title text, price float) index_zones = 'h, th, title' html_strip = '1'
JSON
POST /cli -d "
CREATE TABLE products(title text, price float) index_zones = 'h, th, title' html_strip = '1'"
PHP
$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ],[
            'index_zones' => 'h*,th,title',
            'html_strip' => '1'
        ]);
Python
utilsApi.sql('CREATE TABLE products(title text, price float) index_zones = \'h, th, title\' html_strip = \'1\'')
javascript
res = await utilsApi.sql('CREATE TABLE products(title text, price float) index_zones = \'h, th, title\' html_strip = \'1\'');
Java
utilsApi.sql("CREATE TABLE products(title text, price float) index_zones = 'h, th, title' html_strip = '1'");
C#
utilsApi.Sql("CREATE TABLE products(title text, price float) index_zones = 'h, th, title' html_strip = '1'");
CONFIG
table products {
  index_zones = h*, th, title
  html_strip = 1

  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}
¶ 6.4

Creating a distributed table

Manticore allows for the creation of distributed tables, which act like regular plain or real-time tables, but are actually a collection of child tables used for searching. When a query is sent to a distributed table, it is distributed among all tables in the collection. The server then collects and processes the responses to sort and recalculate values of aggregates, if necessary.

From the client's perspective, it appears as if they are querying a single table.

Distributed tables can be composed of any combination of tables, including:

Mixing percolate and template tables with plain and real-time tables is not recommended.

A distributed table is defined as type 'distributed' in the configuration file or through the SQL clause CREATE TABLE

In a configuration file

table foo {
    type = distributed
    local = bar
    local = bar1, bar2
    agent = 127.0.0.1:9312:baz
    agent = host1|host2:tbl
    agent = host1:9301:tbl1|host2:tbl2 [ha_strategy=random retry_count=10]
    ...
}

Via SQL

CREATE TABLE distributed_index type='distributed' local='local_index' agent='127.0.0.1:9312:remote_index'

Children

The essence of a distributed table lies in its list of child tables, to which it points. There are two types of child tables in a distributed table:

  1. Local tables: These are tables that are served within the same server as the distributed table. To enumerate local tables, you use the syntax local =. You can list several local tables using multiple local = lines, or combine them into one list separated by commas.

  2. Remote tables: These are tables that are served anywhere outside the server. To enumerate remote tables, you use the syntax agent =. Each line represents one endpoint or agent. Each agent can have multiple external locations and options for how it should work. More details here. It is important to note that the server does not have any information about the type of table it is working with. This may lead to errors if, for example, you issue a CALL PQ to a remote table 'foo' that is not a percolate table.

¶ 6.4.1

Creating a local distributed table

A distributed table in Manticore Search acts as a "master node" that proxies the demanded query to other tables and provides merged results from the responses it receives. The table doesn't hold any data on its own. It can connect to both local tables and tables located on other servers. Here's an example of a simple distributed table:

Configuration file
table index_dist {
  type  = distributed
  local = index1
  local = index2
  ...
 }
RT mode
CREATE TABLE local_dist type='distributed' local='index1' local='index2';
PHP
$params = [
    'body' => [
        'settings' => [
            'type' => 'distributed',
            'local' => [
                'index1',
                'index2'
            ]
        ]
    ],
    'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);
Python
utilsApi.sql('CREATE TABLE local_dist type=\'distributed\' local=\'index1\' local=\'index2\'')
javascript
res = await utilsApi.sql('CREATE TABLE local_dist type=\'distributed\' local=\'index1\' local=\'index2\'');
Java
utilsApi.sql("CREATE TABLE local_dist type='distributed' local='index1' local='index2'");
C#
utilsApi.Sql("CREATE TABLE local_dist type='distributed' local='index1' local='index2'");
¶ 6.4.2

Remote tables

A remote table in Manticore Search is represented by the agent prefix in the definition of a distributed table. A distributed table can include a combination of local and remote tables. If there are no local tables provided, the distributed table will be purely remote and serve as a proxy only. For example, you might have a Manticore instance that listens on multiple ports and serves different protocols, and then redirects queries to backend servers that only accept connections via Manticore's internal binary protocol, using persistent connections to reduce the overhead of establishing connections.
Even though a purely remote distributed table doesn't serve local tables itself, it still consumes machine resources, as it still needs to perform final calculations, such as merging results and calculating final aggregated values.

agent

agent = address1 [ | address2 [...] ][:table-list]
agent = address1[:table-list [ | address2[:table-list [...] ] ] ]

agent directive declares the remote agents that are searched each time the enclosing distributed table is searched. These agents are essentially pointers to networked tables. The value specified includes the address and can also include multiple alternatives (agent mirrors) for either the address only or the address and table list.

The address specification must be one of the following:

address = hostname[:port] # eg. server2:9312
address = /absolute/unix/socket/path # eg. /var/run/manticore2.sock

The hostname is the remote host name, port is the remote TCP port number, table-list is a comma-separated list of table names, and square brackets [] indicate an optional clause.

If the table name is omitted, it is assumed to be the same table as the one where this line is defined. In other words, when defining agents for the 'mycoolindex' distributed table, you can simply point to the address, and it will be assumed that you are querying the mycoolindex table on the agent's endpoints.

If the port number is omitted, it is assumed to be 9312. If it is defined but invalid (e.g. 70000), the agent will be skipped.

You can point each agent to one or more remote tables residing on one or more networked servers with no restrictions. This enables several different usage modes:

All agents are searched in parallel. The index list is passed verbatim to the remote agent. The exact way that list is searched within the agent (i.e. sequentially or in parallel) depends solely on the agent's configuration (see the threads setting). The master has no remote control over this.

It is important to note that the LIMIT, option is ignored in agent queries. This is because each agent can contain different tables, so it is the responsibility of the client to apply the limit to the final result set. This is why the query to a physical table is different from the query to a distributed table when viewed in the query logs. The query cannot be a simple copy of the original query, as this would not produce the correct results.

For example, if a client makes a query SELECT ... LIMIT 10, 10, and there are two agents, with the second agent having only 10 documents, broadcasting the original LIMIT 10, 10 query would result in receiving 0 documents from the second agent. However, LIMIT 10,10 should return documents 10-20 from the resulting set. To resolve this, the query must be sent to the agents with a broader limit, such as the default max_matches value of 1000.

For instance, if there is a distributed table dist that refers to the remote table user, a client query SELECT * FROM dist LIMIT 10,10 would be converted to SELECT * FROM user LIMIT 0,1000 and sent to the remote table user. Once the distributed table receives the result, it will apply the LIMIT 10,10 and return the requested 10 documents.

SELECT * FROM dist LIMIT 10,10;

the query will be converted to:

SELECT * FROM user LIMIT 0,1000

Additionally, the value can specify options for each individual agent, such as:

agent = address1:table-list[[ha_strategy=value, conn=value, blackhole=value]]

Example:

# config on box1
# sharding a table over 3 servers
agent = box2:9312:shard1
agent = box3:9312:shard2

# config on box2
# sharding a table over 3 servers
agent = box1:9312:shard2
agent = box3:9312:shard3

# config on box3
# sharding a table over 3 servers
agent = box1:9312:shard1
agent = box2:9312:shard3

# per agent options
agent = box1:9312:shard1[ha_strategy=nodeads]
agent = box2:9312:shard2[conn=pconn]
agent = box2:9312:shard2[conn=pconn,ha_strategy=nodeads]
agent = test:9312:any[blackhole=1]
agent = test:9312|box2:9312|box3:9312:any2[retry_count=2]
agent = test:9312|box2:9312:any2[retry_count=2,conn=pconn,ha_strategy=noerrors]

For optimal performance, it's recommended to place remote tables that reside on the same server within the same record. For instance, instead of:

agent = remote:9312:idx1
agent = remote:9312:idx2

you should prefer:

agent = remote:9312:idx1,idx2

agent_persistent

agent_persistent = remotebox:9312:index2

The agent_persistent option allows you to persistently connect to an agent, meaning the connection will not be dropped after a query is executed. The syntax for this directive is the same as the agent directive. However, instead of opening a new connection to the agent for each query and then closing it, the master will keep a connection open and reuse it for subsequent queries. The maximum number of persistent connections per agent host is defined by the persistent_connections_limit option in the searchd section.

It's important to note that the persistent_connections_limit must be set to a value greater than 0 in order to use persistent agent connections. If it's not defined, it defaults to 0, and the agent_persistent directive will act the same as the agentdirective.

Using persistent master-agent connections reduces TCP port pressure and saves time on connection handshakes, making it more efficient.

agent_blackhole

agent_blackhole = testbox:9312:testindex1,testindex2

The agent_blackhole directive allows you to forward queries to remote agents without waiting for or processing their responses. This is useful for debugging or testing production clusters, as you can set up a separate debugging/testing instance and forward requests to it from the production master (aggregator) instance, without interfering with production work. The master searchd will attempt to connect to the blackhole agent and send queries as normal, but will not wait for or process any responses, and all network errors on the blackhole agents will be ignored. The format of the value is identical to that of the regular agent directive.

agent_connect_timeout

agent_connect_timeout = 300

The agent_connect_timeout directive defines the timeout for connecting to remote agents. By default, the value is assumed to be in milliseconds, but can have another suffix). The default value is 1000 (1 second).

When connecting to remote agents, searchd will wait for this amount of time at most to complete the connection successfully. If the timeout is reached but the connection has not been established, and retries are enabled, a retry will be initiated.

agent_query_timeout

agent_query_timeout = 10000 # our query can be long, allow up to 10 sec

The agent_query_timeout sets the amount of time that searchd will wait for a remote agent to complete a query. The default value is 3000 milliseconds (3 seconds), but can be suffixed to indicate a different unit of time.

After establishing a connection, searchd will wait for a maximum of agent_query_timeout for remote queries to complete. Note that this timeout is separate from the agent_connection_timeout and the total possible delay caused by a remote agent will be the sum of both values. If the agent_query_timeout is reached, the query will not be retried, instead, a warning will be produced.

Note that behavior is also affected by reset_network_timeout_on_packet

agent_retry_count

The agent_retry_count is an integer that specifies how many times Manticore will attempt to connect and query remote agents in a distributed table before reporting a fatal query error. It works similarly to agent_retry_count defined in the "searchd" section of the configuration file but applies specifically to the table.

mirror_retry_count

mirror_retry_count serves the same purpose as agent_retry_count. If both values are provided, mirror_retry_count will take precedence, and a warning will be raised.

Instance-wide options

The following options manage the overall behavior of remote agents and are specified in the searchd section of the configuration file. They set default values for the entire Manticore instance.

Note, that if you use agent mirrors in the definition of your distributed table, the server will select a different mirror before each connection attempt according to the specified ha_strategy specified. In this case the agent_retry_count will be aggregated for all mirrors in the set.

For example, if you have 10 mirrors and set agent_retry_count=5, he server will attempt up to 50 retries (assuming an average of 5 tries per every 10 mirrors). In case of the option ha_strategy = roundrobin, it will actually be exactly 5 tries per mirror.

At the same time, the value provided as the retry_count option in the agent definition serves as an absolute limit. In other words, the [retry_count=2] option in the agent definition means there will be a maximum of 2 tries, regardless of whether there is 1 or 10 mirrors in the line.

agent_retry_delay

The agent_retry_delay is an integer value that determines the amount of time, in milliseconds, that Manticore Search will wait before retrying to query a remote agent in case of a failure. This value can be specified either globally in the searchd configuration or on a per-query basis using the OPTION retry_delay=XXX clause. If both options are provided, the per-query option will take precedence over the global one. The default value is 500 milliseconds (0.5 seconds). This option is only relevant if agent_retry_count or the per-query OPTION retry_count are non-zero.

client_timeout

The client_timeout option sets the maximum waiting time between requests when using persistent connections. This value is expressed in seconds or with a time suffix. The default value is 5 minutes.

Example:

client_timeout = 1h

hostname_lookup

The hostname_lookup option defines the strategy for renewing hostnames. By default, the IP addresses of agent host names are cached at server start to avoid excessive access to DNS. However, in some cases, the IP can change dynamically (e.g. cloud hosting) and it may be desirable to not cache the IPs. Setting this option to request disables the caching and queries the DNS for each query. The IP addresses can also be manually renewed using the FLUSH HOSTNAMES command.

listen_tfo

The listen_tfo option allows for the use of the TCP_FASTOPEN flag for all listeners. By default, it is managed by the system, but it can be explicitly turned off by setting it to '0'.

For more information about the TCP Fast Open extension, please refer to Wikipedia. In short, it allows to eliminate one TCP round-trip when establishing a connection.

In practice, using TFO can optimize the client-agent network efficiency, similar to when agent_persistent is in use, but without holding active connections and without limitations on the maximum number of connections.

Most modern operating systems support TFO. Linux (as one of the most progressive) has supported it since 2011, with kernels starting from 3.7 (for the server side). Windows has supported it since some builds of Windows 10. Other systems, such as FreeBSD and MacOS, are also in the game.

For Linux systems, the server checks the variable /proc/sys/net/ipv4/tcp_fastopen and behaves accordingly. Bit 0 manages the client side, while bit 1 rules the listeners. By default, the system has this parameter set to 1, i.e., clients are enabled and listeners are disabled.

persistent_connections_limit

persistent_connections_limit = 29 # assume that each host of agents has max_connections = 30 (or 29).

The persistent_connections_limit option defines the maximum number of simultaneous persistent connections to remote persistent agents. This is an instance-wide setting and must be defined in the searchd configuration section. Each time a connection to an agent defined under agent_persistent is made, we attempt to reuse an existing connection (if one exists) or create a new connection and save it for future use. However, in some cases it may be necessary to limit the number of persistent connections. This directive defines the limit and affects the number of connections to each agent's host across all distributed tables.

It is recommended to set this value equal to or less than the max_connections option in the agent's configuration.

Distributed snippets creation

A special case of a distributed table is a single local and multiple remotes, which is used exclusively for distributed snippets creation, when snippets are sourced from files. In this case, the local table may act as a "template" table, providing settings for tokenization when building snippets.

snippets_file_prefix

snippets_file_prefix = /mnt/common/server1/

The snippets_file_prefix is an optional prefix that can be added to the local file names when generating snippets. The default value is the current working folder.

To learn more about distributed snippets creation, see CALL SNIPPETS.

Distributed percolate tables (DPQ tables)

You can create a distributed table from multiple percolate tables. The syntax for constructing this type of table is the same as for other distributed tables, and can include multiplelocal tables as well as agents.

For DPQ, the operations of listing stored queries and searching through them (using CALL PQ) are transparent and work as if all the tables were one single local table. However, data manipulation statements such as insert, replace, truncate are not available.

If you include a non-percolate table in the list of agents, the behavior will be undefined. If the incorrect agent has the same schema as the outer schema of the PQ table (id, query, tags, filters), it will not trigger an error when listing stored PQ rules, and may pollute the list of actual PQ rules stored in PQ tables with its own non-PQ strings. As a result, be cautious and aware of the confusion that this may cause. ACALL PQ to such an incorrect agent will trigger an error.

For more information on making queries to a distributed percolate table, see making queries to a distribute percolate table.

¶ 7

Listing tables

Manticore Search has a single level of hierarchy for tables.

Unlike other DBMS, there is no concept of grouping tables into databases in Manticore. However, for interoperability with SQL dialects, Manticore accepts SHOW DATABASES statements for interoperability with SQL dialect, statements, but the statement does not return any results.

SHOW TABLES

General syntax:

SHOW TABLES [ LIKE pattern ]

The SHOW TABLESstatement lists all currently active tables along with their types. The existing table types are local, distributed, rt, percolate and template.

SQL
SHOW TABLES;
+----------+-------------+
| Index    | Type        |
+----------+-------------+
| dist     | distributed |
| plain    | local       |
| pq       | percolate   |
| rt       | rt          |
| template | template    |
+----------+-------------+
5 rows in set (0.00 sec)
PHP
$client->nodes()->table();
Array
(
    [dist1] => distributed
    [rt] => rt
    [products] => rt
)
Python
utilsApi.sql('SHOW TABLES')
{u'columns': [{u'Index': {u'type': u'string'}},
              {u'Type': {u'type': u'string'}}],
 u'data': [{u'Index': u'dist1', u'Type': u'distributed'},
           {u'Index': u'rt', u'Type': u'rt'},
           {u'Index': u'products', u'Type': u'rt'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
javascript
res = await utilsApi.sql('SHOW TABLES');
{"columns":[{"Index":{"type":"string"}},{"Type":{"type":"string"}}],"data":[{"Index":"products","Type":"rt"}],"total":0,"error":"","warning":""}
Java
utilsApi.sql("SHOW TABLES")
{columns=[{Index={type=string}}, {Type={type=string}}], data=[{Index=products, Type=rt}], total=0, error=, warning=}
C#
utilsApi.Sql("SHOW TABLES")
{columns=[{Index={type=string}}, {Type={type=string}}], data=[{Index=products, Type=rt}], total=0, error="", warning=""}

Optional LIKE clause is supported for filtering tables by name.

SQL
SHOW TABLES LIKE 'pro%';
+----------+-------------+
| Index    | Type        |
+----------+-------------+
| products | distributed |
+----------+-------------+
1 row in set (0.00 sec)
PHP
$client->nodes()->table(['body'=>['pattern'=>'pro%']]);
Array
(
    [products] => distributed
)
Python
res = await utilsApi.sql('SHOW TABLES LIKE \'pro%\'');
{u'columns': [{u'Index': {u'type': u'string'}},
              {u'Type': {u'type': u'string'}}],
 u'data': [{u'Index': u'products', u'Type': u'rt'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
javascript
utilsApi.sql('SHOW TABLES LIKE \'pro%\'')
{"columns":[{"Index":{"type":"string"}},{"Type":{"type":"string"}}],"data":[{"Index":"products","Type":"rt"}],"total":0,"error":"","warning":""}
Java
utilsApi.sql("SHOW TABLES LIKE 'pro%'")
{columns=[{Index={type=string}}, {Type={type=string}}], data=[{Index=products, Type=rt}], total=0, error=, warning=}
C#
utilsApi.Sql("SHOW TABLES LIKE 'pro%'")
{columns=[{Index={type=string}}, {Type={type=string}}], data=[{Index=products, Type=rt}], total=0, error="", warning=""}

DESCRIBE

{DESC | DESCRIBE} table [ LIKE pattern ]

The DESCRIBE statement lists the table columns and their associated types. The columns are document ID, full-text fields, and attributes. The order matches the order in which fields and attributes are expected by INSERT and REPLACE statements. Column types include field, integer, timestamp, ordinal, bool, float, bigint, string, and mva. ID column will be typed as bigint. Example:

mysql> DESC rt;
+---------+---------+
| Field   | Type    |
+---------+---------+
| id      | bigint  |
| title   | field   |
| content | field   |
| gid     | integer |
+---------+---------+
4 rows in set (0.00 sec)

An optional LIKE clause is supported. Refer to
SHOW META for its syntax details.

SELECT FROM name.@table

You can also view the table schema by executing the query select * from <table_name>.@table. The benefit of this method is that you can use the WHERE clause for filtering:

SQL
select * from tbl.@table where type='text';
+------+-------+------+----------------+
| id   | field | type | properties     |
+------+-------+------+----------------+
|    2 | title | text | indexed stored |
+------+-------+------+----------------+
1 row in set (0.00 sec)

You can also perform many other actions on <your_table_name>.@table considering it as a regular Manticore table with columns consisting of integer and string attributes.

SQL
select field from tbl.@table;
select field, properties from tbl.@table where type in ('text', 'uint');
select * from tbl.@table where properties any ('stored');

SHOW CREATE TABLE

SHOW CREATE TABLE name

Prints the CREATE TABLE statement used to create the specified table.

SQL
SHOW CREATE TABLE tbl\G
       Table: tbl
Create Table: CREATE TABLE tbl (
f text indexed stored
) charset_table='non_cjk,cjk' morphology='icu_chinese'
1 row in set (0.00 sec)

Percolate table schemas

If you use the DESC statement on a percolate table, it will display the outer table schema, which is the schema of stored queries. This schema is static and the same for all local percolate tables:

mysql> DESC pq;
+---------+--------+
| Field   | Type   |
+---------+--------+
| id      | bigint |
| query   | string |
| tags    | string |
| filters | string |
+---------+--------+
4 rows in set (0.00 sec)

If you want to view the expected document schema, use the following command:
DESC <pq table name> table:

mysql> DESC pq TABLE;
+-------+--------+
| Field | Type   |
+-------+--------+
| id    | bigint |
| title | text   |
| gid   | uint   |
+-------+--------+
3 rows in set (0.00 sec)

Also desc pq table like ... is supported and works as follows:

mysql> desc pq table like '%title%';
+-------+------+----------------+
| Field | Type | Properties     |
+-------+------+----------------+
| title | text | indexed stored |
+-------+------+----------------+
1 row in set (0.00 sec)
¶ 8

Deleting a table

Deleting a table is performed in 2 steps internally:
1. Table is cleared (similar to TRUNCATE)
2. All table files are removed from the table folder. All the external table files that were used by the table (such as wordforms, extensions or stopwords) are also deleted. Note that these external files are copied to the table folder when CREATE TABLE is used, so the original files specified in CREATE TABLE will not be deleted.

Deleting a table is possible only when the server is running in the RT mode. It is possible to delete RT tables, PQ tables and distributed tables.

SQL
DROP TABLE products;
Query OK, 0 rows affected (0.02 sec)
JSON
POST /cli -d "DROP TABLE products"
{
"total":0,
"error":"",
"warning":""
}
PHP
$params = [ 'index' => 'products' ];

$response = $client->indices()->drop($params);
Array
(
    [total] => 0
    [error] =>
    [warning] =>
)
Python
utilsApi.sql('DROP TABLE products')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('DROP TABLE products');
{"total":0,"error":"","warning":""}
Java
sqlresult = utilsApi.sql("DROP TABLE products");
{total=0, error=, warning=}
C#
sqlresult = utilsApi.Sql("DROP TABLE products");
{total=0, error="", warning=""}

Here is the syntax of the DROP TABLE statement in SQL:

DROP TABLE [IF EXISTS] index_name

When deleting a table via SQL, adding IF EXISTS can be used to delete the table only if it exists. If you try to delete a non-existing table with the IF EXISTS option, nothing happens.

When deleting a table via PHP, you can add an optional silent parameter which works the same as IF EXISTS.

SQL
DROP TABLE IF EXISTS products;
JSON
POST /cli -d "DROP TABLE IF EXISTS products"
PHP
$params =
[
  'index' => 'products',
  'body' => ['silent' => true]
];

$client->indices()->drop($params);
Python
utilsApi.sql('DROP TABLE IF EXISTS products')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('DROP TABLE IF EXISTS products');
{"total":0,"error":"","warning":""}
Java
sqlresult = utilsApi.sql("DROP TABLE IF EXISTS products");
{total=0, error=, warning=}
C#
sqlresult = utilsApi.Sql("DROP TABLE IF EXISTS products");
{total=0, error="", warning=""}
¶ 9

Emptying a table

The table can be emptied with a TRUNCATE TABLE SQL statement or with a truncate() PHP client function.

Here is the syntax for the SQL statement:

TRUNCATE TABLE index_name [WITH RECONFIGURE]

When this statement is executed, it clears the RT table completely. It disposes the in-memory data, unlinks all the table data files, and releases the associated binary logs.

A table can also be emptied with DELETE FROM index WHERE id>0, but it's not recommended as it's slower than TRUNCATE.

SQL
TRUNCATE TABLE products;
Query OK, 0 rows affected (0.02 sec)
JSON
POST /cli -d "TRUNCATE TABLE products"
{
"total":0,
"error":"",
"warning":""
}
PHP
$params = [ 'index' => 'products' ];
$response = $client->indices()->truncate($params);
Array(
    [total] => 0
    [error] => 
    [warning] => 
)
Python
utilsApi.sql('TRUNCATE TABLE products')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('TRUNCATE TABLE products');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("TRUNCATE TABLE products");
{total=0, error=, warning=}
C#
utilsApi.Sql("TRUNCATE TABLE products");
{total=0, error="", warning=""}

One of the possible uses of this command is before attaching a table.

When RECONFIGURE option is used new tokenization, morphology, and other text processing settings specified in the config take effect after the table gets cleared. In case the schema declaration in config is different from the table schema the new schema from config got applied after table get cleared.

With this option clearing and reconfiguring a table becomes one atomic operation.

SQL
TRUNCATE TABLE products with reconfigure;
Query OK, 0 rows affected (0.02 sec)
HTTP
POST /cli -d "TRUNCATE TABLE products with reconfigure"
{
"total":0,
"error":"",
"warning":""
}
PHP
$params = [ 'index' => 'products', 'with' => 'reconfigure' ];
$response = $client->indices()->truncate($params);
Array(
    [total] => 0
    [error] => 
    [warning] => 
)
Python
utilsApi.sql('TRUNCATE TABLE products WITH RECONFIGURE')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('TRUNCATE TABLE products WITH RECONFIGURE');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("TRUNCATE TABLE products WITH RECONFIGURE");
{total=0, error=, warning=}
C#
utilsApi.Sql("TRUNCATE TABLE products WITH RECONFIGURE");
{total=0, error="", warning=""}
¶ 10

Manticore cluster

Manticore Search is a highly distributed system that provides all the necessary components to create a highly available and scalable database for search. This includes:

Manticore Search offers great flexibility in terms of how you set up your cluster. There are no limitations, so it's up to you to design your cluster according to your needs. Simply learn about the tools mentioned above and use them to achieve your desired goal.

¶ 10.1

Adding a new node

To add a new node to a cluster, simply start another instance of Manticore and ensure that it is accessible by the other nodes in the cluster. Connect the new node to the rest of the cluster using a distributed table and ensure data safety with replication.

¶ 10.2

Adding a distributed table with remote agents

To understand how to add a distributed table with remote agents, it is important to first have a basic understanding of distributed tables In this article, we will focus on how to use a distributed table as the basis for creating a cluster of Manticore instances.

Here is an example of how to split data over 4 servers, each serving one of the shards:

ini
table mydist {
          type  = distributed
          agent = box1:9312:shard1
          agent = box2:9312:shard2
          agent = box3:9312:shard3
          agent = box4:9312:shard4
}

In the event of a server failure, the distributed table will still work, but the results from the failed shard will be missing.

Now that we've added mirrors, each shard is found on 2 servers. By default, the master (the searchd instance with the distributed table) will randomly pick one of the mirrors.

The mode used for picking mirrors can be set using the ha_strategy setting. In addition to the default random mode there's also ha_strategy = roundrobin.

More advanced strategies based on latency-weighted probabilities include noerrors and nodeads. These not only take out mirrors with issues but also monitor response times and do balancing. If a mirror responds slower (for example, due to some operations running on it), it will receive fewer requests. When the mirror recovers and provides better times, it will receive more requests.

ini
table mydist {
          type  = distributed
          agent = box1:9312|box5:9312:shard1
          agent = box2:9312:|box6:9312:shard2
          agent = box3:9312:|box7:9312:shard3
          agent = box4:9312:|box8:9312:shard4
}
¶ 10.2.1

Mirroring

Agent mirrors can be used interchangeably when processing a search query. The Manticore instance(s) hosting the distributed table where the mirrored agents are defined keeps track of mirror status (alive or dead) and response times, and performs automatic failover and load balancing based on this information.

Agent mirrors

agent = node1|node2|node3:9312:shard2

The above example declares that node1:9312, node2:9312, and node3:9312 all have a table called shard2, and can be used as interchangeable mirrors. If any of these servers go down, the queries will be distributed between the remaining two. When the server comes back online, the master will detect it and begin routing queries to all three nodes again.

A mirror may also include an individual table list, as follows:

agent = node1:9312:node1shard2|node2:9312:node2shard2

This works similarly to the previous example, but different table names will be used when querying different servers. For example, node1shard2 will be used when querying node1:9312, and node2shard will be used when querying node2:9312.

By default, all queries are routed to the best of the mirrors. The best mirror is selected based on recent statistics, as controlled by the ha_period_karma config directive. The master stores metrics (total query count, error count, response time, etc.) for each agent and groups these by time spans. The karma is the length of the time span. The best agent mirror is then determined dynamically based on the last two such time spans. The specific algorithm used to pick a mirror can be configured with the ha_strategy directive.

The karma period is in seconds and defaults to 60 seconds. The master stores up to 15 karma spans with per-agent statistics for instrumentation purposes (see SHOW AGENT STATUS statement). However, only the last two spans out of these are used for HA/LB logic.

When there are no queries, the master sends a regular ping command every ha_ping_interval milliseconds in order to collect statistics and check if the remote host is still alive. The ha_ping_interval defaults to 1000 msec. Setting it to 0 disables pings, and statistics will only be accumulated based on actual queries.

Example:

# sharding table over 4 servers total
# in just 2 shards but with 2 failover mirrors for each shard
# node1, node2 carry shard1 as local
# node3, node4 carry shard2 as local

# config on node1, node2
agent = node3:9312|node4:9312:shard2

# config on node3, node4
agent = node1:9312|node2:9312:shard1
¶ 10.2.2

Load balancing

Load balancing is turned on by default for any distributed table that uses mirroring. By default, queries are distributed randomly among the mirrors. You can change this behavior by using the ha_strategy.

ha_strategy

ha_strategy = {random|nodeads|noerrors|roundrobin}

The mirror selection strategy for load balancing is optional and is set to random by default.

The strategy used for mirror selection, or in other words, choosing a specific agent mirror in a distributed table, is controlled by this directive. Essentially, this directive controls how master performs the load balancing between the configured mirror agent nodes. The following strategies are implemented:

Simple random balancing

The default balancing mode is simple linear random distribution among the mirrors. This means that equal selection probabilities are assigned to each mirror. This is similar to round-robin (RR), but does not impose a strict selection order.

Example
ha_strategy = random

Adaptive randomized balancing

The default simple random strategy does not take into account the status of mirrors, error rates, and most importantly, actual response latencies. To address heterogeneous clusters and temporary spikes in agent node load, there are a group of balancing strategies that dynamically adjust the probabilities based on the actual query latencies observed by the master.

The adaptive strategies based on latency-weighted probabilities work as follows:

  1. Latency stats are accumulated in blocks of ha_period_karma seconds.
  2. Latency-weighted probabilities are recomputed once per karma period.
  3. The "dead or alive" flag is adjusted once per request, including ping requests.

Initially, the probabilities are equal. On every step, they are scaled by the inverse of the latencies observed during the last karma period, and then renormalized. For example, if during the first 60 seconds after the master startup, 4 mirrors had latencies of 10 ms, 5 ms, 30 ms, and 3 ms respectively, the first adjustment step would go as follows:

  1. Initial percentages: 0.25, 0.25, 0.25, 0.25.
  2. Observed latencies: 10 ms, 5 ms, 30 ms, 3 ms.
  3. Inverse latencies: 0.1, 0.2, 0.0333, 0.333.
  4. Scaled percentages: 0.025, 0.05, 0.008333, 0.0833.
  5. Renormalized percentages: 0.15, 0.30, 0.05, 0.50.

This means that the first mirror would have a 15% chance of being chosen during the next karma period, the second one a 30% chance, the third one (slowest at 30 ms) only a 5% chance, and the fourth and fastest one (at 3 ms) a 50% chance. After that period, the second adjustment step would update those chances again, and so on.

The idea is that once the observed latencies stabilize, the latency weighted probabilities will stabilize as well. All these adjustment iterations are meant to converge at a point where the average latencies are roughly equal across all mirrors.

nodeads

Latency-weighted probabilities, but dead mirrors are excluded from the selection. A "dead" mirror is defined as a mirror that has resulted in multiple hard errors (e.g. network failure, or no answer, etc) in a row.

Example
ha_strategy = nodeads

noerrors

Latency-weighted probabilities, but mirrors with a worse error/success ratio are excluded from selection.

Example
ha_strategy = noerrors

Round-robin balancing

Simple round-robin selection, that is, selecting the first mirror in the list, then the second one, then the third one, etc, and then repeating the process once the last mirror in the list is reached. Unlike with the randomized strategies, RR imposes a strict querying order (1, 2, 3, ..., N-1, N, 1, 2, 3, ..., and so on) and guarantees that no two consecutive queries will be sent to the same mirror.

Example
ha_strategy = roundrobin

Instance-wide options

ha_period_karma

ha_period_karma = 2m

ha_period_karma defines the size of the agent mirror statistics window, in seconds (or a time suffix). Optional, the default is 60.

For a distributed table with agent mirrors, the server tracks several different per-mirror counters. These counters are then used for failover and balancing. (The server picks the best mirror to use based on the counters.) Counters are accumulated in blocks of ha_period_karma seconds.

After beginning a new block, the master may still use the accumulated values from the previous one until the new one is half full. Thus, any previous history stops affecting the mirror choice after at most 1.5 times ha_period_karma seconds.

Although at most 2 blocks are used for mirror selection, up to 15 last blocks are actually stored for instrumentation purposes. They can be inspected using the SHOW AGENT STATUS statement.

ha_ping_interval

ha_ping_interval = 3s

ha_ping_interval directive defines the interval between pings sent to the agent mirrors, in milliseconds (or with a time suffix). This option is optional and its default value is 1000.

For a distributed table with agent mirrors, the server sends all mirrors a ping command during idle periods to track their current status (whether they are alive or dead, network roundtrip time, etc.). The interval between pings is determined by the ha_ping_interval setting.

If you want to disable pings, set ha_ping_interval to 0.

¶ 10.3

Setting up replication

With Manticore, write transactions (such as INSERT, REPLACE, DELETE, TRUNCATE, UPDATE, COMMIT) can be replicated to other cluster nodes before the transaction is fully applied on the current node. Currently, replication is supported for percolate, rt and distributed tables in Linux and macOS. However, Manticore Search packages for Windows do not provide replication support.

Manticore's replication is powered by the Galera library and boasts several impressive features:

To set up replication in Manticore Search:

If there is no replication listen directive set, Manticore will use the first two free ports in the range of 200 ports after the default protocol listening port for each created cluster. To set replication ports manually, the listen directive (of replication type) port range must be defined and the address/port range pairs must not intersect between different nodes on the same server. As a rule of thumb, the port range should specify at least two ports per cluster.

Replication cluster

A replication cluster is a group of nodes in which a write transaction is replicated. Replication is set up on a per-table basis, meaning that one table can only belong to one cluster. There is no limit on the number of tables that a cluster can have. All transactions such as INSERT, REPLACE, DELETE, TRUNCATE on any percolate or real-time table that belongs to a cluster are replicated to all the other nodes in that cluster. Distributed tables can also be part of the replication process. Replication is multi-master, so writes to any node or multiple nodes simultaneously will work just as well.

To create a cluster, you can typically use the command create cluster with CREATE CLUSTER <cluster name>, and to join a cluster, you can use join cluster with JOIN CLUSTER <cluster name> at 'host:port'. However, in some rare cases, you may want to fine-tune the behavior of CREATE/JOIN CLUSTER. The available options are:

name

This option specifies the name of the cluster. It should be unique among all the clusters in the system.

Note: The maximum allowable hostname length for the JOIN command is 253 characters. If you exceed this limit, searchd will generate an error.

path

The path option specifies the data directory for write-set cache replication and incoming tables from other nodes. This value should be unique among all the clusters in the system and should be specified as a relative path to the data_dir. directory. By default, it is set to the value of data_dir.

nodes

The nodes option is a list of address:port pairs for all the nodes in the cluster, separated by commas. This list should be obtained using the node's API interface and can include the address of the current node as well. It is used to join the node to the cluster and to rejoin it after a restart.

options

The options option allows you to pass additional options directly to the Galera replication plugin, as described in the Galera Documentation Parameters

Write statements

When working with a replication cluster, all write statements such as INSERT, REPLACE, DELETE, TRUNCATE, UPDATE that modify the content of a cluster's table must use thecluster_name:index_name expression instead of the table name. This ensures that the changes are propagated to all replicas in the cluster. If the correct expression is not used, an error will be triggered.

In the JSON interface, the cluster property must be set along with the table name for all write statements to a cluster's table. Failure to set the cluster property will result in an error.

The Auto ID for a table in a cluster should be valid as long as the server_id is correctly configured.

SQL
INSERT INTO posts:weekly_index VALUES ( 'iphone case' )
TRUNCATE RTINDEX click_query:weekly_index
UPDATE INTO posts:rt_tags SET tags=(101, 302, 304) WHERE MATCH ('use') AND id IN (1,101,201)
DELETE FROM clicks:rt WHERE MATCH ('dumy') AND gid>206
JSON
POST /insert -d '
{
  "cluster":"posts",
  "index":"weekly_index",
  "doc":
  {
    "title" : "iphone case",
    "price" : 19.85
  }
}'
POST /delete -d '
{
  "cluster":"posts",
  "index": "weekly_index",
  "id":1
}'
PHP
$index->addDocuments([
        1, ['title' => 'iphone case', 'price' => 19.85]
]);
$index->deleteDocument(1);
Python
indexApi.insert({"cluster":"posts","index":"weekly_index","doc":{"title":"iphone case","price":19.85}})
indexApi.delete({"cluster":"posts","index":"weekly_index","id":1})
Javascript
res = await indexApi.insert({"cluster":"posts","index":"weekly_index","doc":{"title":"iphone case","price":19.85}});
 res = await indexApi.delete({"cluster":"posts","index":"weekly_index","id":1});
Java
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
    put("price",19.85);
}};
newdoc.index("weekly_index").cluster("posts").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
deleteRequest.index("weekly_index").cluster("posts").setId(1L);
indexApi.delete(deleteRequest);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Crossbody Bag with Tassel");
doc.Add("price", 19.85);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "weekly_index", cluster:posts, id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

DeleteDocumentRequest deleteDocumentRequest = new DeleteDocumentRequest(index: "weekly_index", cluster: "posts", id: 1);
indexApi.Delete(deleteDocumentRequest);

Read statements

Read statements such as SELECT, CALL PQ, DESCRIBE can either use regular table names that are not prepended with a cluster name, or they can use the cluster_name:index_nameformat. If the latter is used, the cluster_name component is ignored.

When using the HTTP endpoint json/search, the cluster property can be specified if desired, but it can also be omitted.

SQL
SELECT * FROM weekly_index
CALL PQ('posts:weekly_index', 'document is here')
JSON
POST /search -d '
{
  "cluster":"posts",
  "index":"weekly_index",
  "query":{"match":{"title":"keyword"}}
}'
POST /search -d '
{
  "index":"weekly_index",
  "query":{"match":{"title":"keyword"}}
}'

Cluster parameters

Replication plugin options can be adjusted using the SET statement.

A list of available options can be found in the Galera Documentation Parameters .

SQL
SET CLUSTER click_query GLOBAL 'pc.bootstrap' = 1
JSON
POST /cli -d "
SET CLUSTER click_query GLOBAL 'pc.bootstrap' = 1
"

Cluster with diverged nodes

It's possible for replicated nodes to diverge from one another, leading to a state where all nodes are labeled as non-primary. This can occur as a result of a network split between nodes, a cluster crash, or if the replication plugin experiences an exception when determining the primary component. In such a scenario, it's necessary to select a node and promote it to the role of primary component.

To identify the node that needs to be promoted, you should compare the last_committed cluster status variable value on all nodes. If all the servers are currently running, there's no need to restart the cluster. Instead, you can simply promote the node with the highest last_committed value to the primary component using the SET statement (as demonstrated in the example).

The other nodes will then reconnect to the primary component and resynchronize their data based on this node.

SQL
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
JSON
POST /cli -d "
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
"

Replication and cluster

To use replication, you need to define one listen port for SphinxAPI protocol and one listen for replication address and port range in the configuration file. Also, specify the data_dir folder to receive incoming tables.

ini
searchd {
  listen   = 9312
  listen   = 192.168.1.101:9360-9370:replication
  data_dir = /var/lib/manticore/
  ...
 }

To replicate tables, you must create a cluster on the server that has the local tables to be replicated.

SQL
CREATE CLUSTER posts
JSON
POST /cli -d "
CREATE CLUSTER posts
"
PHP
$params = [
    'cluster' => 'posts'
    ]
];
$response = $client->cluster()->create($params);
Python
utilsApi.sql('CREATE CLUSTER posts')
Javascript
res = await utilsApi.sql('CREATE CLUSTER posts');
Java
utilsApi.sql("CREATE CLUSTER posts");
C#
utilsApi.Sql("CREATE CLUSTER posts");

Add these local tables to the cluster

SQL
ALTER CLUSTER posts ADD pq_title
ALTER CLUSTER posts ADD pq_clicks
JSON
POST /cli -d "
ALTER CLUSTER posts ADD pq_title
"
POST /cli -d "
ALTER CLUSTER posts ADD pq_clicks
"
PHP
$params = [
  'cluster' => 'posts',
  'body' => [
     'operation' => 'add',
     'index' => 'pq_title'

  ]
];
$response = $client->cluster()->alter($params);
$params = [
  'cluster' => 'posts',
  'body' => [
     'operation' => 'add',
     'index' => 'pq_clicks'

  ]
];
$response = $client->cluster()->alter($params);   
Python
utilsApi.sql('ALTER CLUSTER posts ADD pq_title')
utilsApi.sql('ALTER CLUSTER posts ADD pq_clicks')
Javascript
res = await utilsApi.sql('ALTER CLUSTER posts ADD pq_title');
res = await utilsApi.sql('ALTER CLUSTER posts ADD pq_clicks');
Java
utilsApi.sql("ALTER CLUSTER posts ADD pq_title");
utilsApi.sql("ALTER CLUSTER posts ADD pq_clicks");
C#
utilsApi.Sql("ALTER CLUSTER posts ADD pq_title");
utilsApi.Sql("ALTER CLUSTER posts ADD pq_clicks");

All other nodes that wish to receive a replica of the cluster's tables should join the cluster as follows:

SQL
JOIN CLUSTER posts AT '192.168.1.101:9312'
JSON
POST /cli -d "
JOIN CLUSTER posts AT '192.168.1.101:9312'
"
PHP
$params = [
  'cluster' => 'posts',
  'body' => [
      '192.168.1.101:9312'
  ]
];
$response = $client->cluster->join($params);
Python
utilsApi.sql('JOIN CLUSTER posts AT \'192.168.1.101:9312\'')
Javascript
res = await utilsApi.sql('JOIN CLUSTER posts AT \'192.168.1.101:9312\'');
Java
utilsApi.sql("JOIN CLUSTER posts AT '192.168.1.101:9312'");
C#
utilsApi.Sql("JOIN CLUSTER posts AT '192.168.1.101:9312'");

When running queries, prepend the table name with the cluster name posts: or use the cluster property for HTTP request object.

SQL
INSERT INTO posts:pq_title VALUES ( 3, 'test me' )
JSON
POST /insert -d '
{
  "cluster":"posts",
  "index":"pq_title",
  "id": 3
  "doc":
  {
    "title" : "test me"
  }
}'
PHP
$index->addDocuments([
        3, ['title' => 'test me']
]);
Python
indexApi.insert({"cluster":"posts","index":"pq_title","id":3"doc":{"title":"test me"}})
Javascript
res = await indexApi.insert({"cluster":"posts","index":"pq_title","id":3"doc":{"title":"test me"}});
Java
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","test me");
}};
newdoc.index("pq_title").cluster("posts").id(3L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "test me");
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "pq_title", cluster: "posts", id: 3, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

All queries that modify tables in the cluster are now replicated to all nodes in the cluster.

¶ 10.3.1

Creating a replication cluster

To create a replication cluster, you must set its name at a minimum.

If you are creating a single cluster or the first cluster, you may omit the path option. In this case, the data_dir option will be used as the cluster path. However, for all subsequent clusters, you must specify the path and the path must be available. The nodes option may also be set to list all nodes in the cluster.

SQL
CREATE CLUSTER posts
CREATE CLUSTER click_query '/var/data/click_query/' as path
CREATE CLUSTER click_query '/var/data/click_query/' as path, 'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312' as nodes
JSON
POST /cli -d "
CREATE CLUSTER posts
"
POST /cli -d "
CREATE CLUSTER click_query '/var/data/click_query/' as path
"
POST /cli -d "
CREATE CLUSTER click_query '/var/data/click_query/' as path, 'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312' as nodes
"
PHP
$params = [
    'cluster' => 'posts',
    ]
];
$response = $client->cluster()->create($params);
$params = [
    'cluster' => 'click_query',
    'body' => [
        'path' => '/var/data/click_query/'
    ]    
    ]
];
$response = $client->cluster()->create($params);
$params = [
    'cluster' => 'click_query',
    'body' => [
        'path' => '/var/data/click_query/',
        'nodes' => 'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312'
    ]    
    ]
];
$response = $client->cluster()->create($params);
Python
utilsApi.sql('CREATE CLUSTER posts')
utilsApi.sql('CREATE CLUSTER click_query \'/var/data/click_query/\' as path')
utilsApi.sql('CREATE CLUSTER click_query \'/var/data/click_query/\' as path, \'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312\' as nodes')
javascript
res = await utilsApi.sql('CREATE CLUSTER posts');
res = await utilsApi.sql('CREATE CLUSTER click_query \'/var/data/click_query/\' as path');
res = await utilsApi.sql('CREATE CLUSTER click_query \'/var/data/click_query/\' as path, \'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312\' as nodes');
Java
utilsApi.sql("CREATE CLUSTER posts");
utilsApi.sql("CREATE CLUSTER click_query '/var/data/click_query/' as path");
utilsApi.sql("CREATE CLUSTER click_query '/var/data/click_query/' as path, 'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312' as nodes");
C#
utilsApi.Sql("CREATE CLUSTER posts");
utilsApi.Sql("CREATE CLUSTER click_query '/var/data/click_query/' as path");
utilsApi.Sql("CREATE CLUSTER click_query '/var/data/click_query/' as path, 'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312' as nodes");

If the nodes option is not specified when creating a cluster, the first node that joins the cluster will be saved as the nodes option.

¶ 10.3.2

Joining a replication cluster

To join an existing cluster, you must specify at least:

SQL
JOIN CLUSTER posts AT '10.12.1.35:9312'
JSON
POST /cli -d "
JOIN CLUSTER posts AT '10.12.1.35:9312'
"
PHP
$params = [
  'cluster' => 'posts',
  'body' => [
      '10.12.1.35:9312'
  ]
];
$response = $client->cluster->join($params);
Python
utilsApi.sql('JOIN CLUSTER posts AT \'10.12.1.35:9312\'')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('JOIN CLUSTER posts AT \'10.12.1.35:9312\'');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("JOIN CLUSTER posts AT '10.12.1.35:9312'");
C#
utilsApi.Sql("JOIN CLUSTER posts AT '10.12.1.35:9312'");

In most cases, the above is sufficient when there is a single replication cluster. However, if you are creating multiple replication clusters, you must also set the path and ensure that the directory is available.

SQL
JOIN CLUSTER c2 at '127.0.0.1:10201' 'c2' as path

A node joins a cluster by obtaining data from a specified node and, if successful, updates the node lists across all other cluster nodes in the same way as if it was done manually through ALTER CLUSTER ... UPDATE nodes. This list is used to re-join nodes to the cluster upon restart.

There are two lists of nodes:
1.cluster_<name>_nodes_set: used to re-join nodes to the cluster upon restart. It is updated across all nodes in the same way as ALTER CLUSTER ... UPDATE nodes does. JOIN CLUSTER command performs this update automatically. The Cluster status displays this list as cluster_<name>_nodes_set.
2. cluster_<name>_nodes_view: this list contains all active nodes used for replication and does not require manual management. ALTER CLUSTER ... UPDATE nodes actually copies this list of nodes to the list of nodes used to re-join upon restart. The Cluster status displays this list as cluster_<name>_nodes_view.

When nodes are located in different network segments or data centers, the nodes option may be set explicitly. This minimizes traffic between nodes and utilizes gateway nodes for intercommunication between data centers. The following code joins an existing cluster using the nodes option.

Note: The cluster cluster_<name>_nodes_set list is not updated automatically when this syntax is used. To update it, use ALTER CLUSTER ... UPDATE nodes.

SQL
JOIN CLUSTER click_query 'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312' as nodes
JSON
POST /cli -d "
JOIN CLUSTER click_query 'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312' as nodes
"
PHP
$params = [
  'cluster' => 'posts',
  'body' => [
      'nodes' => 'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312'
  ]
];
$response = $client->cluster->join($params);
Python
utilsApi.sql('JOIN CLUSTER click_query \'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312\' as nodes')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('JOIN CLUSTER click_query \'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312\' as nodes');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("JOIN CLUSTER click_query 'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312' as nodes");
C#
utilsApi.Sql("JOIN CLUSTER click_query 'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312' as nodes");

The JOIN CLUSTER command works synchronously and completes as soon as the node receives all data from the other nodes in the cluster and is in sync with them.

¶ 10.3.3

Deleting a replication cluster

The DELETE CLUSTER statement removes the specified cluster with its name. Once the cluster is deleted, it is removed from all nodes, but its tables remain intact and become active local non-replicated tables.

SQL
DELETE CLUSTER click_query
JSON
POST /cli -d "DELETE CLUSTER click_query"
PHP
$params = [
    'cluster' => 'click_query',
    'body' => []
];
$response = $client->cluster()->delete($params);                
Python
utilsApi.sql('DELETE CLUSTER click_query')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('DELETE CLUSTER click_query');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("DELETE CLUSTER click_query");
C#
utilsApi.Sql("DELETE CLUSTER click_query");
¶ 10.3.4

Adding and removing a table from a replication cluster

ALTER CLUSTER <cluster_name> ADD <table_name> adds an existing local table to the cluster. The node that receives the ALTER query sends the table to the other nodes in the cluster. All the local tables with the same name on the other nodes of the cluster are replaced with the new table.

Once the table is replicated, write statements can be performed on any node, but the table name must be prefixed with the cluster name, like INSERT INTO <clusterName>:<table_name>.

SQL
ALTER CLUSTER click_query ADD clicks_daily_index
JSON
POST /cli -d "
ALTER CLUSTER click_query ADD clicks_daily_index
"
PHP
$params = [
  'cluster' => 'click_query',
  'body' => [
     'operation' => 'add',
     'index' => 'clicks_daily_index'

  ]
];
$response = $client->cluster()->alter($params);        
Python
utilsApi.sql('ALTER CLUSTER click_query ADD clicks_daily_index')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('ALTER CLUSTER click_query ADD clicks_daily_index');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("ALTER CLUSTER click_query ADD clicks_daily_index");
C#
utilsApi.Sql("ALTER CLUSTER click_query ADD clicks_daily_index");

ALTER CLUSTER <cluster_name> DROP <table_name> forgets about a local table, meaning it does not remove the table files on the nodes, but rather just makes it an inactive, non-replicated table.

Once a table is removed from a cluster, it becomes a local table, and write statements must use just the table name, like INSERT INTO <table_name>, without the cluster prefix.

SQL
ALTER CLUSTER posts DROP weekly_index
JSON
POST /cli -d "
ALTER CLUSTER posts DROP weekly_index
"
PHP
$params = [
  'cluster' => 'posts',
  'body' => [
     'operation' => 'drop',
     'index' => 'weekly_index'

  ]
];
$response = $client->cluster->alter($params);
Python
utilsApi.sql('ALTER CLUSTER posts DROP weekly_index')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('ALTER CLUSTER posts DROP weekly_index');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("ALTER CLUSTER posts DROP weekly_index");
C#
utilsApi.Sql("ALTER CLUSTER posts DROP weekly_index");
¶ 10.3.5

Managing replication nodes

The ALTER CLUSTER <cluster_name> UPDATE nodes statement updates the node lists on each node within the specified cluster to include all active nodes in the cluster. For more information on node lists, see Joining a cluster.

SQL
ALTER CLUSTER posts UPDATE nodes
JSON
POST /cli -d "
ALTER CLUSTER posts UPDATE nodes
"
PHP
$params = [
  'cluster' => 'posts',
  'body' => [
     'operation' => 'update',

  ]
];
$response = $client->cluster()->alter($params); 
Python
utilsApi.sql('ALTER CLUSTER posts UPDATE nodes')
{u'error': u'', u'total': 0, u'warning': u''}
javascript
res = await utilsApi.sql('ALTER CLUSTER posts UPDATE nodes');
{"total":0,"error":"","warning":""}
Java
utilsApi.sql("ALTER CLUSTER posts UPDATE nodes");
C#
utilsApi.Sql("ALTER CLUSTER posts UPDATE nodes");

For instance, when the cluster was initially established, the list of nodes used to rejoin the cluster was 10.10.0.1:9312,10.10.1.1:9312. Since then, other nodes joined the cluster and now the active nodes are 10.10.0.1:9312,10.10.1.1:9312,10.15.0.1:9312,10.15.0.3:9312.However, the list of nodes used to rejoin the cluster has not been updated.

To rectify this, you can run the ALTER CLUSTER ... UPDATE nodes statement to copy the list of active nodes to the list of nodes used to rejoin the cluster. After this, the list of nodes used to rejoin the cluster will include all the active nodes in the cluster.

Both lists of nodes can be viewed using the Cluster status statement (cluster_post_nodes_set and cluster_post_nodes_view).

Removing node from cluster

To remove a node from the replication cluster, follow these steps:
1. Stop the node
2. Remove the information about the cluster from <data_dir>/manticore.json (usually /var/lib/manticore/manticore.json) on the node that has been stopped.
3. Run ALTER CLUSTER cluster_name UPDATE nodes on any other node.

After these steps, the other nodes will forget about the detached node and the detached node will forget about the cluster. This action will not impact the tables in the cluster or on the detached node.

¶ 10.3.6

Replication cluster status

You can view the cluster status information by checking the node status. This can be done using the Node status command, which displays various information about the node, including the cluster status variables.

The output format for the cluster status variables is as follows: cluster_name_variable_name variable_value. Most of the variables are described in the Galera Documentation Status Variables. In addition to these variables, Manticore Search also displays:

SQL
SHOW STATUS
+----------------------------+-------------------------------------------------------------------------------------+
| Counter                    | Value                                                                               |
+----------------------------+-------------------------------------------------------------------------------------+
| cluster_name               | post                                                                                |
| cluster_post_state_uuid    | fba97c45-36df-11e9-a84e-eb09d14b8ea7                                                |
| cluster_post_conf_id       | 1                                                                                   |
| cluster_post_status        | primary                                                                             |
| cluster_post_size          | 5                                                                                   |
| cluster_post_local_index   | 0                                                                                   |
| cluster_post_node_state    | synced                                                                              |
| cluster_post_indexes_count | 2                                                                                   |
| cluster_post_indexes       | pq1,pq_posts                                                                        |
| cluster_post_nodes_set     | 10.10.0.1:9312                                                                      |
| cluster_post_nodes_view    | 10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication |
JSON
POST /cli -d "
SHOW STATUS
"
"
{"columns":[{"Counter":{"type":"string"}},{"Value":{"type":"string"}}],
"data":[
{"Counter":"cluster_name", "Value":"post"},
{"Counter":"cluster_post_state_uuid", "Value":"fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
{"Counter":"cluster_post_conf_id", "Value":"1"},
{"Counter":"cluster_post_status", "Value":"primary"},
{"Counter":"cluster_post_size", "Value":"5"},
{"Counter":"cluster_post_local_index", "Value":"0"},
{"Counter":"cluster_post_node_state", "Value":"synced"},
{"Counter":"cluster_post_indexes_count", "Value":"2"},
{"Counter":"cluster_post_indexes", "Value":"pq1,pq_posts"},
{"Counter":"cluster_post_nodes_set", "Value":"10.10.0.1:9312"},
{"Counter":"cluster_post_nodes_view", "Value":"10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}
],
"total":0,
"error":"",
"warning":""
}
PHP
$params = [
    'body' => []
];
$response = $client->nodes()->status($params);         
(
"cluster_name" => "post",
"cluster_post_state_uuid" => "fba97c45-36df-11e9-a84e-eb09d14b8ea7",
"cluster_post_conf_id" => 1,
"cluster_post_status" => "primary",
"cluster_post_size" => 5,
"cluster_post_local_index" => 0,
"cluster_post_node_state" => "synced",
"cluster_post_indexes_count" => 2,
"cluster_post_indexes" => "pq1,pq_posts",
"cluster_post_nodes_set" => "10.10.0.1:9312",
"cluster_post_nodes_view" => "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"
)
Python
utilsApi.sql('SHOW STATUS')
{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
    {u'Key': u'cluster_name', u'Value': u'post'},
    {u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
    {u'Key': u'cluster_post_conf_id', u'Value': u'1'},
    {u'Key': u'cluster_post_status', u'Value': u'primary'},
    {u'Key': u'cluster_post_size', u'Value': u'5'},
    {u'Key': u'cluster_post_local_index', u'Value': u'0'},
    {u'Key': u'cluster_post_node_state', u'Value': u'synced'},
    {u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
    {u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
    {u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
    {u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
javascript
res = await utilsApi.sql('SHOW STATUS');
{"columns": [{"Key": {"type": "string"}},
              {"Value": {"type": "string"}}],
 "data": [
    {"Key": "cluster_name", "Value": "post"},
    {"Key": "cluster_post_state_uuid", "Value": "fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
    {"Key": "cluster_post_conf_id", "Value": "1"},
    {"Key": "cluster_post_status", "Value": "primary"},
    {"Key": "cluster_post_size", "Value": "5"},
    {"Key": "cluster_post_local_index", "Value": "0"},
    {"Key": "cluster_post_node_state", "Value": "synced"},
    {"Key": "cluster_post_indexes_count", "Value": "2"},
    {"Key": "cluster_post_indexes", "Value": "pq1,pq_posts"},
    {"Key": "cluster_post_nodes_set", "Value": "10.10.0.1:9312"},
    {"Key": "cluster_post_nodes_view", "Value": "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}],
 "error": "",
 "total": 0,
 "warning": ""}
Java
utilsApi.sql("SHOW STATUS");
{columns=[{ Key : { type=string }},
              { Value : { type=string }}],
  data : [
    { Key=cluster_name, Value=post},
    { Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
    { Key=cluster_post_conf_id, Value=1},
    { Key=cluster_post_status, Value=primary},
    { Key=cluster_post_size, Value=5},
    { Key=cluster_post_local_index, Value=0},
    { Key=cluster_post_node_state, Value=synced},
    { Key=cluster_post_indexes_count, Value=2},
    { Key=cluster_post_indexes, Value=pq1,pq_posts},
    { Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
    { Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication}],
  error= ,
  total=0,
  warning= }
C#
utilsApi.sql("SHOW STATUS");
{columns=[{ Key : { type=String }},
              { Value : { type=String }}],
  data : [
    { Key=cluster_name, Value=post},
    { Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
    { Key=cluster_post_conf_id, Value=1},
    { Key=cluster_post_status, Value=primary},
    { Key=cluster_post_size, Value=5},
    { Key=cluster_post_local_index, Value=0},
    { Key=cluster_post_node_state, Value=synced},
    { Key=cluster_post_indexes_count, Value=2},
    { Key=cluster_post_indexes, Value=pq1,pq_posts},
    { Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
    { Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication}],
  error="" ,
  total=0,
  warning="" }
¶ 10.3.7

Restarting a cluster

In a multi-master replication cluster, a reference point must be established before other nodes can join and form the cluster. This is called cluster bootstrapping and involves starting a single node as the primary component. Restarting a single node or reconnecting after a shutdown can be done normally.

In case of a full cluster shutdown, the server that was stopped last should be started first with the --new-cluster command line option or by running manticore_new_cluster through systemd. To ensure that the server is capable of being the reference point, the grastate.dat file located at the cluster path should be updated with a value of 1 for the safe_to_bootstrap option. Both conditions, --new-cluster and safe_to_bootstrap=1, must be met. If any other node is started without these options set, an error will occur. The --new-cluster-force command line option can be used to override this protection and start the cluster from another server forcibly. Alternatively, you can run manticore_new_cluster --force to use systemd.

In the event of a hard crash or an unclean shutdown of all servers in the cluster, the most advanced node with the largest seqno in the grastate.dat file located at the cluster path must be identified and started with the --new-cluster-force command line key.

¶ 10.3.8

Cluster recovery

In the event that the Manticore search daemon stops with no remaining nodes in the cluster to serve requests, recovery is necessary. Due to the multi-master nature of the Galera library used for replication, Manticore replication cluster is a single logical entity that maintains the consistency of its nodes and data, and the status of the entire cluster. This allows for safe writes on multiple nodes simultaneously and ensures the integrity of the cluster.

However, this also poses challenges. Let's examine several scenarios, using a cluster of nodes A, B, and C, to see what needs to be done when some or all nodes become unavailable.

Case 1

When node A is stopped, the other nodes receive a "normal shutdown" message. The cluster size is reduced, and a quorum re-calculation takes place.

Upon starting node A, it joins the cluster and will not serve any write transactions until it is fully synchronized with the cluster. If the writeset cache on donor nodes B or C (which can be controlled with the Galera cluster's gcache.size) still contains all of the transactions missed at node A, node A will receive a fast incremental state transfer (IST), that is, a transfer of only missed transactions. If not, a snapshot state transfer (SST) will occur, which involves the transfer of table files.

Case 2

In the scenario where nodes A and B are stopped, the cluster size is reduced to one, with node C forming the primary component to handle write transactions.

Nodes A and B can then be started as usual and will join the cluster after start-up. Node C acts as the donor, providing the state transfer to nodes A and B.

Case 3

All nodes are stopped as usual and the cluster is off.

The problem now is how to initialize the cluster. It's important that on a clean shutdown of searchd the nodes write the number of last executed transaction into the cluster directory grastate.dat file along with flag safe_to_bootstrap. The node which was stopped last will have option safe_to_bootstrap: 1 and the most advanced seqno number.

It is important that this node starts first to form the cluster. To bootstrap a cluster the server should be started on this node with flag --new-cluster. On Linux you can also run manticore_new_cluster which will start Manticore in --new-cluster mode via systemd.

If another node starts first and bootstraps the cluster, then the most advanced node joins that cluster, performs full SST and receives a table file where some transactions are missed in comparison with the table files it got before. That is why it is important to start first the node which was shut down last, it should have flag safe_to_bootstrap: 1 in grastate.dat.

Case 4

In the event of a crash or network failure causing Node A to disappear from the cluster, nodes B and C will attempt to reconnect with Node A. Upon failure, they will remove Node A from the cluster. With two out of the three nodes still running, the cluster maintains its quorum and continues to operate normally.

When Node A is restarted, it will join the cluster automatically, as outlined in Case 1.

Case 5

Nodes A and B have gone offline. Node C is unable to form a quorum on its own as 1 node is less than half of the total nodes (3). As a result, the cluster on node C is shifted to a non-primary state and rejects any write transactions with an error message.

Meanwhile, node C waits for the other nodes to connect and also tries to connect to them. If this happens, and the network is restored and nodes A and B are back online, the cluster will automatically reform. If nodes A and B are just temporarily disconnected from node C but can still communicate with each other, they will continue to operate as normal, as they still form the quorum.

However, if both nodes A and B have crashed or restarted due to a power failure, someone must activate the primary component on node C using the following command:

SQL
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
JSON
POST /cli -d "
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
"

t's important to note that before executing this command, you must confirm that the other nodes are truly unreachable. Otherwise, a split-brain scenario may occur and separate clusters may form.

Case 6

All nodes have crashed. In this situation, the grastate.dat file in the cluster directory has not been updated and does not contain a valid seqnosequence number.

If this occurs, someone needs to locate the node with the most recent data and start the server on it using the --new-cluster-force command line key. All other nodes will start as normal, as described in Case 3).
On Linux, you can also use the manticore_new_cluster --force, command, which will start Manticore in --new-cluster-force mode via systemd.

Case 7

Split-brain can cause the cluster to transition into a non-primary state. For example, consider a cluster comprised of an even number of nodes (four), such as two pairs of nodes located in different data centers. If a network failure interrupts the connection between the data centers, split-brain occurs as each group of nodes holds exactly half of the quorum. As a result, both groups stop handling write transactions, since the Galera replication model prioritizes data consistency, and the cluster cannot accept write transactions without a quorum. However, nodes in both groups attempt to reconnect with the nodes from the other group in an effort to restore the cluster.

If someone wants to restore the cluster before the network is restored, the same steps outlined in Case 5 hould be taken, but only at one group of nodes.

After the statement is executed, the group with the node that it was run on will be able to handle write transactions once again.

SQL
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
JSON
POST /cli -d "
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
"

However, it's important to note that if the statement is issued at both groups, it will result in the formation of two separate clusters, and the subsequent network recovery will not result in the groups rejoining.

¶ 11

Connecting to the server

With default configuration, Manticore is waiting for your connections on:

SQL
mysql -h0 -P9306
HTTP
HTTP is a stateless protocol, so it doesn't require any special connection phase:
curl -s "http://localhost:9308/search"
PHP
require_once __DIR__ . '/vendor/autoload.php';
$config = ['host'=>'127.0.0.1','port'=>9308];
$client = new \Manticoresearch\Client($config);
Python
import manticoresearch
config = manticoresearch.Configuration(
    host = "http://127.0.0.1:9308"
)
client =  manticoresearch.ApiClient(config)
indexApi = manticoresearch.IndexApi(client)
searchApi = manticoresearch.searchApi(client)
utilsApi = manticoresearch.UtilsApi(client)
Javascript
var Manticoresearch = require('manticoresearch');
var client= new Manticoresearch.ApiClient()
client.basePath="http://127.0.0.1:9308";
indexApi = new Manticoresearch.IndexApi(client);
searchApi = new Manticoresearch.SearchApi(client);
utilsApi = new Manticoresearch.UtilsApi(client);
Java
import com.manticoresearch.client.ApiClient;
import com.manticoresearch.client.ApiException;
import com.manticoresearch.client.Configuration;
import com.manticoresearch.client.model.*;
import com.manticoresearch.client.api.IndexApi;
import com.manticoresearch.client.api.UtilsApi;
import com.manticoresearch.client.api.SearchApi;

ApiClient client = Configuration.getDefaultApiClient();
client.setBasePath("http://127.0.0.1:9308");

IndexApi indexApi = new IndexApi(client);
SearchApi searchApi = new UtilsApi(client);
UtilsApi utilsApi = new UtilsApi(client);
C#
using ManticoreSearch.Client;
using ManticoreSearch.Api;
using ManticoreSearch.Model;

string basePath = "http://127.0.0.1:9308";
IndexApi indexApi = new IndexApi(basePath);
SearchApi searchApi = new UtilsApi(basePath);
UtilsApi utilsApi = new UtilsApi(basePath);
If you are familiar with Docker, you can use Manticore's [official Docker image](https://github.com/manticoresoftware/docker) to run Manticore. Here is how you can connect to Manticore's docker via MySQL:
docker
Run Manticore container and use built-in MySQL client to connect to the node.
docker run -e EXTRA=1 --name manticore -d manticoresearch/manticore && docker exec -it manticore mysql
¶ 11.1

MySQL protocol

Manticore Search implements an SQL interface using the MySQL protocol, allowing any MySQL library or connector and many MySQL clients to be used to connect to Manticore Search and work with it as if it were a MySQL server, not Manticore.

However, the SQL dialect is different and implements only a subset of the SQL commands or functions available in MySQL. Additionally, there are clauses and functions that are specific to Manticore Search, such as the MATCH() clause for full-text search.

Manticore Search does not support server-side prepared statements, but client-side prepared statements can be used. It is important to note that Manticore implements the multi-value (MVA) data type, which has no equivalent in MySQL or libraries implementing prepared statements. In these cases, the MVA values must be crafted in the raw query.

Some MySQL clients/connectors require values for user/password and/or database name. Since Manticore Search does not have the concept of databases and there is no user access control implemented, these values can be set arbitrarily as Manticore will simply ignore them.

Configuration

The default port for the SQL interface is 9306 and it's enabled by default.

You can configure the MySQL port in the searchd section of the configuration file using the listen directive like this:

searchd {
...
   listen = 127.0.0.1:9306:mysql
...
}

Keep in mind that Manticore doesn't have user authentication, so make sure that the MySQL port is not accessible to anyone outside of your network.

VIP connection

A separate MySQL port can be used for performing "VIP" connections. When connecting to this port, the thread pool is bypassed, and a new dedicated thread is always created. This is useful in cases of severe overload, where the server would either stall or prevent a connection through the regular port.

searchd {
...
   listen = 127.0.0.1:9306:mysql
   listen = 127.0.0.1:9307:mysql_vip
...
}

Connecting via standard MySQL client

The easiest way to connect to Manticore is by using a standard MySQL client:

mysql -P9306 -h0

Secured MySQL connection

The MySQL protocol supports SSL encryption. Secure connections can be made on the same mysql listening port.

Compressed MySQL connection

Compression can be used with MySQL connections and is available to clients by default. The client just needs to specify that the connection should use compression.

An example using the MySQL client:

mysql -P9306 -h0 -C

Compression can be used in both secured and non-secured connections.

Notes on MySQL connectors

The official MySQL connectors can be used to connect to Manticore Search, however they might require certain settings passed in the DSN string as the connector can try running certain SQL commands not implemented yet in Manticore.

JDBC Connector 6.x and above require Manticore Search 2.8.2 or greater and the DSN string should contain the following options:

jdbc:mysql://IP:PORT/DB/?characterEncoding=utf8&maxAllowedPacket=512000&serverTimezone=XXX

By default Manticore Search will report it's own version to the connector, however this may cause some troubles. To overcome that mysql_version_string directive in searchd section of the configuration should be set to a version lower than 5.1.1:

searchd {
...
   mysql_version_string = 5.0.37
...
}

.NET MySQL connector uses connection pools by default. To correctly get the statistics of SHOW META, queries along with SHOW META command should be sent as a single multistatement (SELECT ...;SHOW META). If pooling is enabled option Allow Batch=True is required to be added to the connection string to allow multistatements:

Server=127.0.0.1;Port=9306;Database=somevalue;Uid=somevalue;Pwd=;Allow Batch=True;

Notes on ODBC connectivity

Manticore can be accessed using ODBC. It's recommended to set charset=UTF8 in the ODBC string. Some ODBC drivers will not like the reported version by the Manticore server as they will see it as a very old MySQL server. This can be overridden with mysql_version_string option.

Comment syntax

Manticore SQL over MySQL supports C-style comment syntax. Everything from an opening /* sequence to a closing */ sequence is ignored. Comments can span multiple lines, can not nest, and should not get logged. MySQL specific /*! ... */ comments are also currently ignored. (As the comments support was rather added for better compatibility with mysqldump produced dumps, rather than improving general query interoperability between Manticore and MySQL.)

SELECT /*! SQL_CALC_FOUND_ROWS */ col1 FROM table1 WHERE ...
¶ 11.2

HTTP

You can connect to Manticore Search through HTTP/HTTPS.

Configuration

By default, Manticore listens for HTTP, HTTPS, and binary requests on ports 9308 and 9312.

In the "searchd" section of your configuration file, you can define the HTTP port using the listen directive as follows:

Both lines are valid and have the same meaning (except for the port number). They both define listeners that will serve all API/HTTP/HTTPS protocols. There are no special requirements, and any HTTP client can be used to connect to Manticore.

HTTP
searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9312:http
...
}

All HTTP endpoints return application/json content type. For the most part, endpoints use JSON payloads for requests. However, there are some exceptions that use NDJSON or simple URL-encoded payloads.

Currently, there is no user authentication. Therefore, make sure that the HTTP interface is not accessible to anyone outside your network. As Manticore functions like any other web server, you can use a reverse proxy, such as Nginx, to implement HTTP authentication or caching.

The HTTP protocol also supports SSL encryption:
If you specify :https instead of :http only secured connections will be accepted. Otherwise in case of no valid key/certificate provided, but the client trying to connect via https - the connection will be dropped. If you make not HTTPS, but an HTTP request to 9443 it will respond with HTTP code 400.

HTTPS
searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9443:https
...
}

VIP Connection

Separate HTTP interface can be used for 'VIP' connections. In this case, the connection bypasses a thread pool and always creates a new dedicated thread. This is useful for managing Manticore Search during periods of severe overload when the server might stall or not allow regular port connections.

For more information on the listen directive, see this section.

VIP
searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9318:_vip
...
}

SQL over HTTP

Endpoints /sql and /cli allow running SQL queries via HTTP.

/sql

/sql accepts an SQL SELECT query via HTTP JSON interface.

Query payload must be URL encoded, otherwise query statements with = (filtering or setting options) will result in an error.

It returns a JSON response which contains hits information and execution time. The response has the same format as json/search endpoint. Note, that /sql endpoint supports only single search requests. If you are looking for processing a multi-query see below.

HTTP
POST /sql -d "query=select%20id%2Csubject%2Cauthor_id%20%20from%20forum%20where%20match%28%27%40subject%20php%20manticore%27%29%20group%20by%20author_id%20order%20by%20id%20desc%20limit%200%2C5"
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "total_relation": "eq",
    "hits": [
      {
        "_id": "2",
        "_score": 2356,
        "_source": {
          "subject": "php manticore",
          "author_id": 12
        }
      },
      {
        "_id": "1",
        "_score": 2356,
        "_source": {
          "subject": "php manticore",
          "author_id": 11
        }
      }
    ]
  }
}

/sql?mode=raw

/sql endpoint also has a special mode "raw", which allows to send any valid sphinxql queries including multi-queries. The returned value is a json array of one or more result sets.

HTTP
POST /sql?mode=raw -d "query=desc%20test"
[
  {
    "columns": [
      {
        "Field": {
          "type": "string"
        }
      },
      {
        "Type": {
          "type": "string"
        }
      },
      {
        "Properties": {
          "type": "string"
        }
      }
    ],
    "data": [
      {
        "Field": "id",
        "Type": "bigint",
        "Properties": ""
      },
      {
        "Field": "title",
        "Type": "text",
        "Properties": "indexed"
      },
      {
        "Field": "gid",
        "Type": "uint",
        "Properties": ""
      },
      {
        "Field": "title",
        "Type": "string",
        "Properties": ""
      },
      {
        "Field": "j",
        "Type": "json",
        "Properties": ""
      },
      {
        "Field": "new1",
        "Type": "uint",
        "Properties": ""
      }
    ],
    "total": 6,
    "error": "",
    "warning": ""
  }
]

/cli

While the /sql endpoint is useful to control Manticore programmatically from your application, there's also endpoint /cli which makes it easier to maintain a Manticore instance via curl or your browser manually. It accepts POST and GET HTTP methods. Everything after /cli? is taken by Manticore as is, even if you don't escape it manually via curl or let the browser encode it automatically. The + sign is not decoded to a space as well, eliminating the necessity of encoding it. The response format is tabular, similar to the one returned by MySQL console.

HTTP
POST /cli -d "desc test"
+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| body  | text   | indexed stored |
| title | string |                |
+-------+--------+----------------+
3 rows in set (0.001 sec)
Browser
![using /cli in browser](cli_browser.png)

/cli_json

The /cli_json endpoint provides the same functionality as /cli , but returns the response in JSON format.

HTTP
POST /cli_json -d "desc test"
[{
"columns":[{"Field":{"type":"string"}},{"Type":{"type":"string"}},{"Properties":{"type":"string"}}],
"data":[
{"Field":"id","Type":"bigint","Properties":""},
{"Field":"body","Type":"text","Properties":"indexed stored"},
{"Field":"title","Type":"string","Properties":""}
],
"total":3,
"error":"",
"warning":""
}]

Keep-alive

HTTP keep-alive is also supported, which makes working via the HTTP JSON interface stateful as long as the client supports keep-alive too. For example, using the new /cli endpoint you can call SHOW META after SELECT and it will work the same way it works via mysql.

¶ 11.3

HTTP

You can connect to Manticore Search through HTTP/HTTPS.

Configuration

By default, Manticore listens for HTTP, HTTPS, and binary requests on ports 9308 and 9312.

In the "searchd" section of your configuration file, you can define the HTTP port using the listen directive as follows:

Both lines are valid and have the same meaning (except for the port number). They both define listeners that will serve all API/HTTP/HTTPS protocols. There are no special requirements, and any HTTP client can be used to connect to Manticore.

HTTP
searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9312:http
...
}

All HTTP endpoints return application/json content type. For the most part, endpoints use JSON payloads for requests. However, there are some exceptions that use NDJSON or simple URL-encoded payloads.

Currently, there is no user authentication. Therefore, make sure that the HTTP interface is not accessible to anyone outside your network. As Manticore functions like any other web server, you can use a reverse proxy, such as Nginx, to implement HTTP authentication or caching.

The HTTP protocol also supports SSL encryption:
If you specify :https instead of :http only secured connections will be accepted. Otherwise in case of no valid key/certificate provided, but the client trying to connect via https - the connection will be dropped. If you make not HTTPS, but an HTTP request to 9443 it will respond with HTTP code 400.

HTTPS
searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9443:https
...
}

VIP Connection

Separate HTTP interface can be used for 'VIP' connections. In this case, the connection bypasses a thread pool and always creates a new dedicated thread. This is useful for managing Manticore Search during periods of severe overload when the server might stall or not allow regular port connections.

For more information on the listen directive, see this section.

VIP
searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9318:_vip
...
}

SQL over HTTP

Endpoints /sql and /cli allow running SQL queries via HTTP.

/sql

/sql accepts an SQL SELECT query via HTTP JSON interface.

Query payload must be URL encoded, otherwise query statements with = (filtering or setting options) will result in an error.

It returns a JSON response which contains hits information and execution time. The response has the same format as json/search endpoint. Note, that /sql endpoint supports only single search requests. If you are looking for processing a multi-query see below.

HTTP
POST /sql -d "query=select%20id%2Csubject%2Cauthor_id%20%20from%20forum%20where%20match%28%27%40subject%20php%20manticore%27%29%20group%20by%20author_id%20order%20by%20id%20desc%20limit%200%2C5"
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "total_relation": "eq",
    "hits": [
      {
        "_id": "2",
        "_score": 2356,
        "_source": {
          "subject": "php manticore",
          "author_id": 12
        }
      },
      {
        "_id": "1",
        "_score": 2356,
        "_source": {
          "subject": "php manticore",
          "author_id": 11
        }
      }
    ]
  }
}

/sql?mode=raw

/sql endpoint also has a special mode "raw", which allows to send any valid sphinxql queries including multi-queries. The returned value is a json array of one or more result sets.

HTTP
POST /sql?mode=raw -d "query=desc%20test"
[
  {
    "columns": [
      {
        "Field": {
          "type": "string"
        }
      },
      {
        "Type": {
          "type": "string"
        }
      },
      {
        "Properties": {
          "type": "string"
        }
      }
    ],
    "data": [
      {
        "Field": "id",
        "Type": "bigint",
        "Properties": ""
      },
      {
        "Field": "title",
        "Type": "text",
        "Properties": "indexed"
      },
      {
        "Field": "gid",
        "Type": "uint",
        "Properties": ""
      },
      {
        "Field": "title",
        "Type": "string",
        "Properties": ""
      },
      {
        "Field": "j",
        "Type": "json",
        "Properties": ""
      },
      {
        "Field": "new1",
        "Type": "uint",
        "Properties": ""
      }
    ],
    "total": 6,
    "error": "",
    "warning": ""
  }
]

/cli

While the /sql endpoint is useful to control Manticore programmatically from your application, there's also endpoint /cli which makes it easier to maintain a Manticore instance via curl or your browser manually. It accepts POST and GET HTTP methods. Everything after /cli? is taken by Manticore as is, even if you don't escape it manually via curl or let the browser encode it automatically. The + sign is not decoded to a space as well, eliminating the necessity of encoding it. The response format is tabular, similar to the one returned by MySQL console.

HTTP
POST /cli -d "desc test"
+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| body  | text   | indexed stored |
| title | string |                |
+-------+--------+----------------+
3 rows in set (0.001 sec)
Browser
![using /cli in browser](cli_browser.png)

/cli_json

The /cli_json endpoint provides the same functionality as /cli , but returns the response in JSON format.

HTTP
POST /cli_json -d "desc test"
[{
"columns":[{"Field":{"type":"string"}},{"Type":{"type":"string"}},{"Properties":{"type":"string"}}],
"data":[
{"Field":"id","Type":"bigint","Properties":""},
{"Field":"body","Type":"text","Properties":"indexed stored"},
{"Field":"title","Type":"string","Properties":""}
],
"total":3,
"error":"",
"warning":""
}]

Keep-alive

HTTP keep-alive is also supported, which makes working via the HTTP JSON interface stateful as long as the client supports keep-alive too. For example, using the new /cli endpoint you can call SHOW META after SELECT and it will work the same way it works via mysql.

¶ 12

Data creation and modification

You can add, update, replace, and delete your indexed data using different ways provided by Manticore. Manticore supports working with external storages such as databases, XML, CSV, and TSV documents. For insert and delete operations, a transaction mechanism is supported.

Also, for insert and replace queries, Manticore supports Elasticsearch-like query format along with its own format. For details, see the corresponding examples in the Adding documents to a real-time table and REPLACE sections.

¶ 12.1

Adding documents to a table

¶ 12.1.1

Adding documents to a real-time table

If you're looking for information on adding documents to a plain table, please refer to the section on adding data from external storages.

Adding documents in real-time is supported only for Real-Time and percolate tables. The corresponding SQL command, HTTP endpoint, or client functions insert new rows (documents) into a table with the provided field values. It's not necessary for a table to exist before adding documents to it. If the table doesn't exist, Manticore will attempt to create it automatically. For more information, see Auto schema.

You can insert a single or multiple documents with values for all fields of the table or just a portion of them. In this case, the other fields will be filled with their default values (0 for scalar types, an empty string for text types).

Expressions are not currently supported in INSERT, so values must be explicitly specified.

The ID field/value can be omitted, as RT and PQ tables support auto-id functionality. You can also use 0 as the id value to force automatic ID generation. Rows with duplicate IDs will not be overwritten by INSERT. Instead, you can use REPLACE for that purpose.

When using the HTTP JSON protocol, you have two different request formats to choose from: a common Manticore format and an Elasticsearch-like format. Both formats are demonstrated in the examples below.

Additionally, when using the Manticore JSON request format, keep in mind that the doc node is required, and all the values should be provided within it.

SQL
General syntax:
INSERT INTO <table name> [(column, ...)]
VALUES (value, ...)
[, (...)]
INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products(title) VALUES ('Crossbody Bag with Tassel');
INSERT INTO products VALUES (0,'Yellow bag', 4.95);
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
JSON
POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel",
    "price" : 19.85
  }
}

POST /insert
{
  "index":"products",
  "id":2,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel"
  }
}

POST /insert
{
  "index":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}
{
  "_index": "products",
  "_id": 1,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "_index": "products",
  "_id": 2,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "_index": "products",
  "_id": 0,
  "created": true,
  "result": "created",
  "status": 201
}
Elasticsearch
POST /products/_create/3
{
  "title": "Yellow Bag with Tassel",
  "price": 19.85
}

POST /products/_create/
{
  "title": "Red Bag with Tassel",
  "price": 19.85
}
{
"_id":3,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
{
"_id":2235747273424240642,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
PHP
$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85]
]);
$index->addDocuments([
        ['id' => 2, 'title' => 'Crossbody Bag with Tassel']
]);
$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);
Python
indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})
Javascript
res = await indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}});
res = await indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}});
res = await indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}});
Java
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
    put("price",19.85);
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
}};
newdoc.index("products").id(2L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Crossbody Bag with Tassel");
doc.Add("price", 19.85);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

doc = new Dictionary<string, Object>(); 
doc.Add("title", "Crossbody Bag with Tassel");
newdoc = new InsertDocumentRequest(index: "products", id: 2, doc: doc);
sqlresult = indexApi.Insert(newdoc);

doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
sqlresult = indexApi.Insert(newdoc);

Auto schema

Manticore features an automatic table creation mechanism, which activates when a specified table in the insert query doesn't yet exist. This mechanism is enabled by default. To disable it, set auto_schema = 0 in the Searchd section of your Manticore config file.

By default, all text values in the VALUES clause are considered to be of the text type, except for values representing valid email addresses, which are treated as the string type.

If you attempt to INSERT multiple rows with different, incompatible value types for the same field, auto table creation will be canceled, and an error message will be returned. However, if the different value types are compatible, the resulting field type will be the one that accommodates all the values. Some automatic data type conversions that may occur include:

Keep in mind that the /bulk HTTP endpoint does not support automatic table creation (auto schema). Only the /_bulk (Elasticsearch-like) HTTP endpoint and the SQL interface support this feature.

SQL
MySQL [(none)]> drop table if exists t; insert into t(i,f,t,s,j,b,m,mb) values(123,1.2,'text here','test@mail.com','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777)); desc t; select * from t;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.42 sec)

--------------
insert into t(i,f,t,j,b,m,mb) values(123,1.2,'text here','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777))
--------------

Query OK, 1 row affected (0.00 sec)

--------------
desc t
--------------

+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| t     | text   | indexed stored |
| s     | string |                |
| j     | json   |                |
| i     | uint   |                |
| b     | bigint |                |
| f     | float  |                |
| m     | mva    |                |
| mb    | mva64  |                |
+-------+--------+----------------+
8 rows in set (0.00 sec)

--------------
select * from t
--------------

+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| id                  | i    | b             | f        | m    | mb                          | t         | s             | j          |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| 5045949922868723723 |  123 | 1099511627776 | 1.200000 | 1,2  | 1099511627776,1099511627777 | text here | test@mail.com | {"a": 123} |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
1 row in set (0.00 sec)
JSON
POST /insert  -d
{
 "index":"t",
 "id": 2,
 "doc":
 {
   "i" : 123,
   "f" : 1.23,
   "t": "text here",
   "s": "test@mail.com",
   "j": {"a": 123},
   "b": 1099511627776,
   "m": [1,2],
   "mb": [1099511627776,1099511627777]
 }
}
{"_index":"t","_id":2,"created":true,"result":"created","status":201}

Auto ID

Manticore provides an auto ID generation functionality for the column ID of documents inserted or replaced into a real-time or Percolate table. The generator produces a unique ID for a document with some guarantees, but it should not be considered an auto-incremented ID.

The generated ID value is guaranteed to be unique under the following conditions:

The auto ID generator creates a 64-bit integer for a document ID and uses the following schema:

This schema ensures that the generated ID is unique among all nodes in the cluster and that data inserted into different cluster nodes does not create collisions between the nodes.

As a result, the first ID from the generator used for auto ID is NOT 1 but a larger number. Additionally, the document stream inserted into a table might have non-sequential ID values if inserts into other tables occur between calls, as the ID generator is singular in the server and shared between all its tables.

SQL
INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products VALUES (0,'Yello bag', 4.95);
select * from products;
+---------------------+-----------+---------------------------+
| id                  | price     | title                     |
+---------------------+-----------+---------------------------+
| 1657860156022587404 | 19.850000 | Crossbody Bag with Tassel |
| 1657860156022587405 |  4.950000 | Yello bag                 |
+---------------------+-----------+---------------------------+
JSON
POST /insert
{
  "index":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}

GET /search
{
  "index":"products",
  "query":{
    "query_string":""
  }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "1657860156022587406",
        "_score": 1,
        "_source": {
          "price": 0,
          "title": "Yellow bag"
        }
      }
    ]
  }
}
PHP
$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);
Python
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})
Javascript
res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}});
Java
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

Bulk adding documents

You can insert not just a single document into a real-time table, but as many as you'd like. It's perfectly fine to insert batches of tens of thousands of documents into a real-time table. However, it's important to keep the following points in mind:

Note that the /bulk HTTP endpoint does not support automatic creation of tables (auto schema). Only the /_bulk (Elasticsearch-like) HTTP endpoint and the SQL interface support this feature.

Chunked transfer in /bulk

The /bulk (Manticore mode) endpoint supports Chunked transfer encoding. You can use it to transmit large batches. It:

Bulk insert examples

SQL
For bulk insert, simply provide more documents in brackets after `VALUES()`. The syntax is:
INSERT INTO <table name>[(column1, column2, ...)] VALUES ()[,(value1,[value2, ...])]
The optional column name list allows you to explicitly specify values for some of the columns present in the table. All other columns will be filled with their default values (0 for scalar types, empty string for string types). For example:
INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);
Query OK, 3 rows affected (0.01 sec)
Expressions are currently not supported in `INSERT`, and values should be explicitly specified.
JSON
The syntax is generally the same as for [inserting a single document](#Add-documents). Just provide more lines, one for each document, and use the `/bulk` endpoint instead of `/insert`. Enclose each document in the "insert" node. Note that it also requires: * `Content-Type: application/x-ndjson` * The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline `\n` and possibly `\r`. The `/bulk` endpoint supports 'insert', 'replace', 'delete', and 'update' queries. Keep in mind that you can direct operations to multiple tables, but transactions are only possible for a single table. If you specify more, Manticore will gather operations directed to one table into a single transaction. When the table changes, it will commit the collected operations and initiate a new transaction on the new table. An empty line separating batches also leads to committing the previous batch and starting a new transaction. In the response for a `/bulk` request, you can find the following fields: * "errors": shows whether any errors occurred (true/false) * "error": describes the error that took place * "current_line": the line number where execution stopped (or failed); empty lines, including the first empty line, are also counted * "skipped_lines": the count of non-committed lines, beginning from the `current_line` and moving backward
POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert": {"index":"products", "id":1, "doc":  {"title":"Crossbody Bag with Tassel","price" : 19.85}}}
{"insert":{"index":"products", "id":2, "doc":  {"title":"microfiber sheet set","price" : 19.99}}}
'

POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert":{"index":"test1","id":21,"doc":{"int_col":1,"price":1.1,"title":"bulk doc one"}}}
{"insert":{"index":"test1","id":22,"doc":{"int_col":2,"price":2.2,"title":"bulk doc two"}}}

{"insert":{"index":"test1","id":23,"doc":{"int_col":3,"price":3.3,"title":"bulk doc three"}}}
{"insert":{"index":"test2","id":24,"doc":{"int_col":4,"price":4.4,"title":"bulk doc four"}}}
{"insert":{"index":"test2","id":25,"doc":{"int_col":5,"price":5.5,"title":"bulk doc five"}}}
'
{
  "items": [
    {
      "bulk": {
        "_index": "products",
        "_id": 2,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 4,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

{
  "items": [
    {
      "bulk": {
        "_index": "test1",
        "_id": 22,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "_index": "test1",
        "_id": 23,
        "created": 1,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "_index": "test2",
        "_id": 25,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 8,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}
Elasticsearch
POST /_bulk
-H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "products" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "_index" : "products" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }
'
{
  "items": [
    {
      "index": {
        "_index": "products",
        "_type": "doc",
        "_id": "0",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "create": {
        "_index": "products",
        "_type": "doc",
        "_id": "3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    }
  ],
  "errors": false,
  "took": 1
}
PHP
Use method addDocuments():
$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85],
        ['id' => 2, 'title' => 'microfiber sheet set', 'price' => 19.99],
        ['id' => 3, 'title' => 'Pet Hair Remover Glove', 'price' => 7.99]
]);
Python
docs = [ \
    {"insert": {"index" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
    {"insert": {"index" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
    {"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = indexApi.bulk('\n'.join(map(json.dumps,docs)))
Javascript
let docs = [
    {"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}},
    {"insert": {"index" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}},
    {"insert": {"index" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
Java
String body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";         
BulkResponse bulkresult = indexApi.bulk(body);
C#
string body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";                                 
BulkResponse bulkresult = indexApi.Bulk(string.Join("\n", docs));

Inserting multi-value attributes (MVA) values

Multi-value attributes (MVA) are inserted as arrays of numbers.

Examples

SQL
INSERT INTO products(title, sizes) VALUES('shoes', (40,41,42,43));
JSON
POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "sizes" : [40, 41, 42, 43]
  }
}
Elasticsearch
POST /products/_create/1
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}
Or, alternatively
POST /products/_doc/
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}
PHP
$index->addDocument(
  ['title' => 'shoes', 'sizes' => [40,41,42,43]],
  1
);
Python
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})
Javascript
res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}});
Java
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("sizes",new int[]{40,41,42,43});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
doc.Add("sizes", new List<Object> {40,41,42,43});
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

Inserting JSON

JSON value can be inserted as an escaped string (via SQL or JSON) or as a JSON object (via the JSON interface).

Examples

SQL
INSERT INTO products VALUES (1, 'shoes', '{"size": 41, "color": "red"}');
JSON
JSON value can be inserted as a JSON object
POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : {
      "size": 41,
      "color": "red"
    }
  }
}
JSON value can be also inserted as a string containing escaped JSON:
POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : "{\"size\": 41, \"color\": \"red\"}"
  }
}
Elasticsearch
POST /products/_create/1
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}
Or, alternatively
POST /products/_doc/
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}
Consider JSON just as string:
PHP
$index->addDocument(
  ['title' => 'shoes', 'meta' => '{"size": 41, "color": "red"}'],
  1
);
Python
indexApi = api = manticoresearch.IndexApi(client)
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})
Javascript
res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}});
Java
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("meta",
        new HashMap<String,Object>(){{
            put("size",41);
            put("color","red");
        }});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
C#
Dictionary<string, Object> meta = new Dictionary<string, Object>(); 
meta.Add("size", 41);
meta.Add("color", "red");
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
¶ 12.1.2

Adding rules to a percolate table

In a percolate table documents that are percolate query rules are stored and must follow the exact schema of four fields:

field type description
id bigint PQ rule identifier (if omitted, it will be assigned automatically)
query string Full-text query (can be empty) compatible with the percolate table
filters string Additional filters by non-full-text fields (can be empty) compatible with the percolate table
tags string A string with one or many comma-separated tags, which may be used to selectively show/delete saved queries

Any other field names are not supported and will trigger an error.

Warning: Inserting/replacing JSON-formatted PQ rules via SQL will not work. In other words, the JSON-specific operators (match etc) will be considered just parts of the rule's text that should match with documents. If you prefer JSON syntax, use the HTTP endpoint instead of INSERT/REPLACE.

SQL
INSERT INTO pq(id, query, filters) VALUES (1, '@title shoes', 'price > 5');
INSERT INTO pq(id, query, tags) VALUES (2, '@title bag', 'Louis Vuitton');
SELECT * FROM pq;
+------+--------------+---------------+---------+
| id   | query        | tags          | filters |
+------+--------------+---------------+---------+
|    1 | @title shoes |               | price>5 |
|    2 | @title bag   | Louis Vuitton |         |
+------+--------------+---------------+---------+
JSON
There are two ways you can add a percolate query into a percolate table: * Query in JSON /search compatible format, described at [json/search](#HTTP-JSON)
PUT /pq/pq_table/doc/1
{
  "query": {
    "match": {
      "title": "shoes"
    },
    "range": {
      "price": {
        "gt": 5
      }
    }
  },
  "tags": ["Loius Vuitton"]
}
* Query in SQL format, described at [search query syntax](#Queries-in-SQL-format)
PUT /pq/pq_table/doc/2
{
  "query": {
    "ql": "@title shoes"
  },
  "filters": "price > 5",
  "tags": ["Loius Vuitton"]
}
PHP
$newstoredquery = [
    'index' => 'test_pq',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);
Python
newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(newstoredquery)
Javascript
newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
indexApi.insert(newstoredquery);
Java
newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").id(2L).setDoc(doc);
indexApi.insert(newdoc);
C#
Dictionary<string, Object> query = new Dictionary<string, Object>(); 
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>(); 
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", id: 2, doc: doc);
indexApi.Insert(newdoc);

Auto ID provisioning

If you don't specify an ID, it will be assigned automatically. You can read more about auto-ID here.

SQL
INSERT INTO pq(query, filters) VALUES ('wristband', 'price > 5');
SELECT * FROM pq;
+---------------------+-----------+------+---------+
| id                  | query     | tags | filters |
+---------------------+-----------+------+---------+
| 1657843905795719192 | wristband |      | price>5 |
+---------------------+-----------+------+---------+
JSON
PUT /pq/pq_table/doc
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}

PUT /pq/pq_table/doc
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}
{
  "index": "pq_table",
  "type": "doc",
  "_id": "1657843905795719196",
  "result": "created"
}

{
  "index": "pq_table",
  "type": "doc",
  "_id": "1657843905795719198",
  "result": "created"
}
PHP
$newstoredquery = [
    'index' => 'pq_table',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);
Array(
       [index] => pq_table
       [type] => doc
       [_id] => 1657843905795719198
       [result] => created
)
Python
indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"index" : "test_pq",   "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(store_query)
{'created': True,
 'found': None,
 'id': 1657843905795719198,
 'index': 'test_pq',
 'result': 'created'}
Javascript
newstoredquery ={"index" : "test_pq",  "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
res =  await indexApi.insert(store_query);
{"_index":"test_pq","_id":1657843905795719198,"created":true,"result":"created"}
Java
newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").setDoc(doc);
indexApi.insert(newdoc);
C#
Dictionary<string, Object> query = new Dictionary<string, Object>(); 
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>(); 
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", doc: doc);
indexApi.Insert(newdoc);

No schema in SQL

In case of omitted schema in SQL INSERT command, the following parameters are expected:
1. ID. You can use 0 as the ID to trigger auto-ID generation.
2. Query - Full-text query.
3. Tags - PQ rule tags string.
4. Filters - Additional filters by attributes.

SQL
INSERT INTO pq VALUES (0, '@title shoes', '', '');
INSERT INTO pq VALUES (0, '@title shoes', 'Louis Vuitton', '');
SELECT * FROM pq;
+---------------------+--------------+---------------+---------+
| id                  | query        | tags          | filters |
+---------------------+--------------+---------------+---------+
| 2810855531667783688 | @title shoes |               |         |
| 2810855531667783689 | @title shoes | Louis Vuitton |         |
+---------------------+--------------+---------------+---------+

Replacing rules in a PQ table

To replace an existing PQ rule with a new one in SQL, just use a regular REPLACE command. There's a special syntax ?refresh=1 to replace a PQ rule defined in JSON mode via the HTTP JSON interface.

SQL
mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title shoes |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)

mysql> replace into pq(id,query) values(2810823411335430148,'@title boots');
Query OK, 1 row affected (0.00 sec)

mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title boots |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)
JSON
GET /pq/pq/doc/2810823411335430149
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "2810823411335430149",
        "_score": 1,
        "_source": {
          "query": {
            "match": {
              "title": "shoes"
            }
          },
          "tags": "",
          "filters": ""
        }
      }
    ]
  }
}

PUT /pq/pq/doc/2810823411335430149?refresh=1 -d '{
  "query": {
    "match": {
      "title": "boots"
    }
  }
}'

GET /pq/pq/doc/2810823411335430149
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "2810823411335430149",
        "_score": 1,
        "_source": {
          "query": {
            "match": {
              "title": "boots"
            }
          },
          "tags": "",
          "filters": ""
        }
      }
    ]
  }
}
¶ 12.2

Adding data from external storages

¶ 12.2.1

Plain tables creation

Plain tables are tables that are created one-time by fetching data at creation from one or several sources. A plain table is immutable as documents cannot be added or deleted during its lifespan. It is only possible to update values of numeric attributes (including MVA). Refreshing the data is only possible by recreating the whole table.

Plain tables are available only in the Plain mode and their definition is made up of a table declaration and one or several source declarations. The data gathering and table creation are not made by the searchd server but by the auxiliary tool indexer.

Indexer is a command-line tool that can be called directly from the command line or from shell scripts.

It can accept a number of arguments when called, but there are also several settings of its own in the Manticore configuration file.

In the typical scenario, indexer does the following:

Indexer tool

The indexer tool is used to create plain tables in Manticore Search. It has a general syntax of:

indexer [OPTIONS] [table_name1 [table_name2 [...]]]

When creating tables with indexer, the generated table files must be made with permissions that allow searchd to read, write, and delete them. In case of the official Linux packages, searchd runs under the manticore user. Therefore, indexer must also run under the manticore user:

sudo -u manticore indexer ...

If you are running searchd differently, you might need to omit sudo -u manticore. Just make sure that the user under which your searchd instance is running has read/write permissions to the tables generated using indexer.

To create a plain table, you need to list the table(s) you want to process. For example, if your manticore.conf file contains details on two tables, mybigindex and mysmallindex, you could run:

sudo -u manticore indexer mysmallindex mybigindex

You can also use wildcard tokens to match table names:

sudo -u manticore indexer indexpart*main --rotate

The exit codes for indexer are as follows:

Also, you can run indexer via a systemctl unit file:

systemctl start --no-block manticore-indexer

Or, in case you want to build a specific table:

systemctl start --no-block manticore-indexer@specific-table-name

Find more information about scheduling indexer via systemd below.

Indexer command line arguments

shell sudo -u manticore indexer --config /home/myuser/manticore.conf mytable

shell sudo -u manticore indexer --config /home/myuser/manticore.conf --all

shell sudo -u manticore indexer --rotate --all

shell sudo -u manticore indexer --rotate --all --quiet

shell sudo -u manticore indexer --rotate --all --noprogress

shell sudo -u manticore indexer mytable --buildstops word_freq.txt 1000

This would produce a document in the current directory, word_freq.txt, with the 1,000 most common words in 'mytable', ordered by most common first. Note that the file will pertain to the last table indexed when specified with multiple tables or --all (i.e. the last one listed in the configuration file)

shell sudo -u manticore indexer mytable --buildstops word_freq.txt 1000 --buildfreqs

This would produce the word_freq.txt as above, however after each word would be the number of times it occurred in the table in question.

shell sudo -u manticore indexer --merge main delta --rotate

In the above example, where the main is the master, rarely modified table, and the delta is more frequently modified one, you might use the above to call indexer to combine the contents of the delta into the main table and rotate the tables.

shell sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0

Any documents marked as deleted (value 1) will be removed from the newly-merged destination table. It can be added several times to the command line, to add successive filters to the merge, all of which must be met in order for a document to become part of the final table.

shell sudo -u manticore indexer mytable --keep-attrs=/path/to/index/files

shell sudo -u manticore indexer mytable --keep-attrs=/path/to/table/files --keep-attrs-names=update,state

shell sudo -u manticore indexer --rotate --nohup mytable sudo -u manticore indextool --rotate --check mytable

Indexer configuration settings

You can also configure indexer behavior in the Manticore configuration file in the indexer section:

indexer {
...
}

lemmatizer_cache

lemmatizer_cache = 256M

Lemmatizer cache size. Optional, default is 256K.

Our lemmatizer implementation uses a compressed dictionary format that enables a space/speed tradeoff. It can either perform lemmatization off the compressed data, using more CPU but less RAM, or it can decompress and precache the dictionary either partially or fully, thus using less CPU but more RAM. The lemmatizer_cache directive lets you control how much RAM exactly can be spent for that uncompressed dictionary cache.

Currently, the only available dictionaries are ru.pak, en.pak, and de.pak. These are the Russian, English, and German dictionaries. The compressed dictionary is approximately 2 to 10 MB in size. Note that the dictionary stays in memory at all times too. The default cache size is 256 KB. The accepted cache sizes are 0 to 2047 MB. It's safe to raise the cache size too high; the lemmatizer will only use the needed memory. For example, the entire Russian dictionary decompresses to approximately 110 MB; thus settinglemmatizer_cache higher than that will not affect the memory use. Even when 1024 MB is allowed for the cache, if only 110 MB is needed, it will only use those 110 MB.

max_file_field_buffer

max_file_field_buffer = 128M

Maximum file field adaptive buffer size in bytes. Optional, default is 8MB, minimum is 1MB.

The file field buffer is used to load files referred to from sql_file_field columns. This buffer is adaptive, starting at 1 MB at first allocation, and growing in 2x steps until either the file contents can be loaded or the maximum buffer size, specified by the max_file_field_buffer directive, is reached.

Thus, if no file fields are specified, no buffer is allocated at all. If all files loaded during indexing are under (for example) 2 MB in size, but the max_file_field_buffer value is 128 MB, the peak buffer usage would still be only 2 MB. However, files over 128 MB would be entirely skipped.

max_iops

max_iops = 40

Maximum I/O operations per second, for I/O throttling. Optional, default is 0 (unlimited).

I/O throttling related option. It limits the maximum count of I/O operations (reads or writes) per any given second. A value of 0 means that no limit is imposed.

indexer can cause bursts of intensive disk I/O during building a table, and it might be desirable to limit its disk activity (and reserve something for other programs running on the same machine, such as searchd). I/O throttling helps to do that. It works by enforcing a minimum guaranteed delay between subsequent disk I/O operations performed by indexer. Throttling I/O can help reduce search performance degradation caused by building. This setting is not effective for other kinds of data ingestion, e.g. inserting data into a real-time table.

max_iosize

max_iosize = 1048576

Maximum allowed I/O operation size, in bytes, for I/O throttling. Optional, default is 0 (unlimited).

I/O throttling related option. It limits the maximum file I/O operation (read or write) size for all operations performed by indexer. A value of 0 means that no limit is imposed. Reads or writes that are bigger than the limit will be split into several smaller operations, and counted as several operations by the max_iops setting. At the time of this writing, all I/O calls should be under 256 KB (default internal buffer size) anyway, so max_iosize values higher than 256 KB should not have any effect.

max_xmlpipe2_field

max_xmlpipe2_field = 8M

Maximum allowed field size for XMLpipe2 source type, in bytes. Optional, default is 2 MB.

mem_limit

mem_limit = 256M
# mem_limit = 262144K # same, but in KB
# mem_limit = 268435456 # same, but in bytes

Plain table building RAM usage limit. Optional, default is 128 MB. Enforced memory usage limit that the indexer will not go above. Can be specified in bytes, or kilobytes (using K postfix), or megabytes (using M postfix); see the example. This limit will be automatically raised if set to an extremely low value causing I/O buffers to be less than 8 KB; the exact lower bound for that depends on the built data size. If the buffers are less than 256 KB, a warning will be produced.

The maximum possible limit is 2047M. Too low values can hurt plain table building speed, but 256M to 1024M should be enough for most, if not all datasets. Setting this value too high can cause SQL server timeouts. During the document collection phase, there will be periods when the memory buffer is partially sorted and no communication with the database is performed; and the database server can timeout. You can resolve that either by raising timeouts on the SQL server side or by lowering mem_limit.

on_file_field_error

on_file_field_error = skip_document

How to handle IO errors in file fields. Optional, default is ignore_field.
When there is a problem indexing a file referenced by a file field (sql_file_field), indexer can either process the document, assuming empty content in this particular field, or skip the document, or fail indexing entirely. on_file_field_error directive controls that behavior. The values it takes are:

The problems that can arise are: open error, size error (file too big), and data read error. Warning messages on any problem will be given at all times, regardless of the phase and the on_file_field_error setting.

Note that with on_file_field_error = skip_document documents will only be ignored if problems are detected during an early check phase, and not during the actual file parsing phase. indexer will open every referenced file and check its size before doing any work, and then open it again when doing actual parsing work. So in case a file goes away between these two open attempts, the document will still be indexed.

write_buffer

write_buffer = 4M

Write buffer size, bytes. Optional, default is 1MB. Write buffers are used to write both temporary and final table files when indexing. Larger buffers reduce the number of required disk writes. Memory for the buffers is allocated in addition to mem_limit. Note that several (currently up to 4) buffers for different files will be allocated, proportionally increasing the RAM usage.

ignore_non_plain

ignore_non_plain = 1

ignore_non_plain allows you to completely ignore warnings about skipping non-plain tables. The default is 0 (not ignoring).

Schedule indexer via systemd

There are two approaches to scheduling indexer runs. The first way is the classical method of using crontab. The second way is using a systemd timer with a user-defined schedule. To create the timer unit files, you should place them in the appropriate directory where systemd looks for such unit files. On most Linux distributions, this directory is typically /etc/systemd/system. Here's how to do it:

  1. Create a timer unit file for your custom schedule:
cat << EOF > /etc/systemd/system/manticore-indexer@.timer
[Unit]
Description=Run Manticore Search's indexer on schedule
[Timer]
OnCalendar=minutely
RandomizedDelaySec=5m
Unit=manticore-indexer@%i.service
[Install]
WantedBy=timers.target
EOF

More on the OnCalendar syntax and examples can be found here.

  1. Edit the timer unit for your specific needs.
  2. Enable the timer:
systemctl enable manticore-indexer@idx1.timer
  1. Start the timer:
systemctl start manticore-indexer@idx1.timer
  1. Repeat steps 2-4 for any additional timers.
¶ 12.2.2

Fetching from databases

¶ 12.2.2.1

Introduction

Manticore Search allows fetching data from databases using specialized drivers or ODBC. Current drivers include:

To fetch data from the database, a source must be configured with type as one of the above. The source requires information about how to connect to the database and the query that will be used to fetch the data. Additional pre- and post-queries can also be set - either to configure session settings or to perform pre/post fetch tasks. The source also must contain definitions of data types for the columns that are fetched.

¶ 12.2.2.2

Database connection

The source definition must contain the settings of the connection, this includes the host, port, user credentials, or specific settings of a driver.

sql_host

The database server host to connect to. Note that the MySQL client library chooses whether to connect over TCP/IP or over UNIX socket based on the host name. Specifically, "localhost" will force it to use UNIX socket (this is the default and generally recommended mode) and "127.0.0.1" will force TCP/IP usage.

sql_port

The server IP port to connect to.
For mysql the default is 3306 and for pgsql, it is 5432.

sql_db

The SQL database to use after the connection is established and perform further queries within.

sql_user

The username used for connecting.

sql_pass

The user password to use when connecting. If the password includes # (which can be used to add comments in the configuration file), you can escape it with \.

sql_sock

UNIX socket name to connect to for local database servers. Note that it depends on the sql_host setting whether this value will actually be used.

sql_sock = /var/lib/mysql/mysql.sock

Specific settings for drivers

MySQL

mysql_connect_flags

MySQL client connection flags. Optional, the default value is 0 (do not set any flags).

This option must contain an integer value with the sum of the flags. The value will be passed to mysql_real_connect() verbatim. The flags are enumerated in mysql_com.h include file. Flags that are especially interesting in regard to indexing, with their respective values, are as follows:

mysql_connect_flags = 32 # enable compression

SSL certificate settings

unpack_mysqlcompress

unpack_mysqlcompress_maxsize = 1M

Columns to unpack using MySQL UNCOMPRESS() algorithm. Multi-value, optional, default value is an empty list of columns.

Columns specified using this directive will be unpacked by the indexer using the modified zlib algorithm used by MySQL COMPRESS() and UNCOMPRESS() functions. When indexing on a different box than the database, this lets you offload the database and save on network traffic. This feature is only available if zlib and zlib-devel were both available during build time.

unpack_mysqlcompress = body_compressed
unpack_mysqlcompress = description_compressed

By default, a buffer of 16M is used for uncompressing the data. This can be changed by setting unpack_mysqlcompress_maxsize.

When using unpack_mysqlcompress, due to implementation intricacies, it is not possible to deduce the required buffer size from the compressed data. So, the buffer must be preallocated in advance, and the unpacked data can not go over the buffer size.

unpack_zlib

unpack_zlib = col1
unpack_zlib = col2

Columns to unpack using zlib (aka deflate, aka gunzip). Multi-value, optional, default value is an empty list of columns. Applies to source types mysql and pgsql only.

Columns specified using this directive will be unpacked by the indexer using the standard zlib algorithm (called deflate and also implemented by gunzip). When indexing on a different box than the database, this lets you offload the database and save on network traffic. This feature is only available if zlib and zlib-devel were both available during build time.

MSSQL

MS SQL Windows authentication flag. Whether to use currently logged-in Windows account credentials for authentication when connecting to MS SQL Server.

mssql_winauth = 1

ODBC

Sources using ODBC require the presence of a DSN (Data Source Name) string which can be set with odbc_dsn.

odbc_dsn = Driver={Oracle ODBC Driver};Dbq=myDBName;Uid=myUsername;Pwd=myPassword

Please note that the format depends on the specific ODBC driver used.

¶ 12.2.2.3

Execution of fetch queries

With all the SQL drivers, building a plain table generally works as follows.

Example of a source fetching data from MYSQL:

source mysource {
  type             = mysql
  path             = /path/to/realtime
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query_pre    = SET CHARACTER_SET_RESULTS=utf8
  sql_query_pre    = SET NAMES utf8
  sql_query        =  SELECT id, title, description, category_id  FROM mytable
  sql_query_post   = DROP TABLE view_table
  sql_query_post_index = REPLACE INTO counters ( id, val ) \
    VALUES ( 'max_indexed_id', $maxid )
  sql_attr_uint    = category_id
  sql_field_string = title
 }

table mytable {
  type   = plain
  source = mysource
  path   = /path/to/mytable
  ...
 }

sql_query

This is the query used to retrieve documents from a SQL server. There can be only one sql_query declared, and it's mandatory to have one. See also Processing fetched data

sql_query_pre

Pre-fetch query or pre-query. This is a multi-value, optional setting, with the default being an empty list of queries. The pre-queries are executed before the sql_query in the order they appear in the configuration file. The results of the pre-queries are ignored.

Pre-queries are useful in many ways. They can be used to set up encoding, mark records that are going to be indexed, update internal counters, set various per-connection SQL server options and variables, and so on.

Perhaps the most frequent use of pre-query is to specify the encoding that the server will use for the rows it returns. Note that Manticore accepts only UTF-8 text. Two MySQL specific examples of setting the encoding are:

sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_query_pre = SET NAMES utf8

Also, specific to MySQL sources, it is useful to disable query cache (for indexer connection only) in pre-query, because indexing queries are not going to be re-run frequently anyway, and there's no sense in caching their results.
That could be achieved with:

sql_query_pre = SET SESSION query_cache_type=OFF

sql_query_post

Post-fetch query. This is an optional setting, with the default value being empty.

This query is executed immediately after sql_query completes successfully. When the post-fetch query produces errors, they are reported as warnings, but indexing is not terminated. Its result set is ignored. Note that indexing is not yet completed at the point when this query gets executed, and further indexing may still fail. Therefore, any permanent updates should not be done from here. For instance, updates on a helper table that permanently change the last successfully indexed ID should not be run from the sql_query_post query; they should be run from the sql_query_post_index query instead.

sql_query_post_index

Post-processing query. This is an optional setting, with the default value being empty.

This query is executed when indexing is fully and successfully completed. If this query produces errors, they are reported as warnings, but indexing is not terminated. Its result set is ignored. The $maxid macro can be used in its text; it will be expanded to the maximum document ID that was actually fetched from the database during indexing. If no documents were indexed, $maxid will be expanded to 0.

Example:

sql_query_post_index = REPLACE INTO counters ( id, val ) \
    VALUES ( 'max_indexed_id', $maxid )

The difference between sql_query_post and sql_query_post_index is that sql_query_post is run immediately when Manticore receives all the documents, but further indexing may still fail for some other reason. On the contrary, by the time the sql_query_post_index query gets executed, it is guaranteed that the table was created successfully. Database connection is dropped and re-established because sorting phase can be very lengthy and would just time out otherwise.

¶ 12.2.2.4

Processing fetched data

By default, the first column from the result set of sql_query is indexed as the document id.

Document ID MUST be the very first field, and it MUST BE UNIQUE SIGNED (NON-ZERO) INTEGER NUMBER from -9223372036854775808 to 9223372036854775807.

You can specify up to 256 full-text fields and an arbitrary amount of attributes. All the columns that are neither document ID (the first one) nor attributes will be indexed as full-text fields.

Declaration of attributes:

sql_attr_bigint

Declares a 64-bit signed integer.

sql_attr_bool

Declares a boolean attribute. It's equivalent to an integer attribute with bit count of 1.

sql_attr_float

Declares a floating point attribute.

The values will be stored in single precision, 32-bit IEEE 754 format. Represented range is approximately from 1e-38 to 1e+38. The amount of decimal digits that can be stored precisely is approximately 7.

One important usage of float attributes is storing latitude and longitude values (in radians), for further usage in query-time geosphere distance calculations.

sql_attr_json

Declares a JSON attribute.

When indexing JSON attributes, Manticore expects a text field with JSON formatted data. JSON attributes support arbitrary JSON data with no limitation in nested levels or types.

sql_attr_multi

Declares a multi-value attribute.

Plain attributes only allow attaching 1 value per each document. However, there are cases (such as tags or categories) when it is desired to attach multiple values of the same attribute and be able to apply filtering or grouping to value lists.

The MVA can take the values from a column (like the rest of the data types) - in this case, the column in the result set must provide a string with multiple integer values separated by commas - or by running a separate query to get the values.

When executing a query, the engine runs the query, groups the results by IDs, and assigns the values to their corresponding documents in the table. Values with an ID not found in the table are discarded. Before executing the query, any defined sql_query_pre_all will be run.

The declaration format for sql_attr_multi is as follows:

sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE \
    [;QUERY] \
    [;RANGED-QUERY]

where

It's used with ranged-query SOURCE-TYPE. If using ranged-main-query SOURCE-TYPE, then omit the RANGED-QUERY, and it will automatically use the same query from sql_query_range(useful option in complex inheritance setups to save having to manually duplicate the same query many times).

sql_attr_multi = uint tag from field
sql_attr_multi = uint tag from query; SELECT id, tag FROM tags
sql_attr_multi = bigint tag from ranged-query; \
    SELECT id, tag FROM tags WHERE id>=$start AND id<=$end; \
    SELECT MIN(id), MAX(id) FROM tags

sql_attr_string

Declares a string attribute. The maximum size of each value is fixed at 4GB.

sql_attr_timestamp

Declares a UNIX timestamp.

Timestamps can store dates and times in the range of January 01, 1970, to January 19, 2038, with a precision of one second. The expected column value should be a timestamp in UNIX format, which is a 32-bit unsigned integer number of seconds elapsed since midnight on January 01, 1970, GMT. Timestamps are internally stored and handled as integers everywhere. In addition to working with timestamps as integers, you can also use them with different date-based functions, such as time segments sorting mode or day/week/month/year extraction for GROUP BY.

Note that DATE or DATETIME column types in MySQL cannot be directly used as timestamp attributes in Manticore; you need to explicitly convert such columns using UNIX_TIMESTAMP function (if the data is in range).

Note timestamps can not represent dates before January 01, 1970, and UNIX_TIMESTAMP() in MySQL will not return anything expected. If you only need to work with dates, not times, consider TO_DAYS() function in MySQL instead.

sql_attr_uint

Declares an unsigned integer attribute.

You can specify the bit count for integer attributes by appending ':BITCOUNT' to attribute name (see example below). Attributes with less than default 32-bit size, or bitfields, perform slower.

sql_attr_uint = group_id
sql_attr_uint = forum_id:9 # 9 bits for forum_id

sql_field_string

Declares a combo string attribute/text field. The values will be indexed as a full-text field, but also stored in a string attribute with the same name. Note, it should only be used when you are sure you want the field to be searchable both in a full-text manner and as an attribute (with the ability to sort and group by it). If you just want to be able to fetch the original value of the field, you don't need to do anything for it unless you implicitly removed the field from the stored fields list via stored_fields.

sql_field_string = name

sql_file_field

Declares a file based field.

This directive makes indexer interpret field contents as a file name, and load and process the referred file. Files larger than max_file_field_buffer in size are skipped. Any errors during the file loading (IO errors, missed limits, etc.) will be reported as indexing warnings and will not early terminate the indexing. No content will be indexed for such files.

sql_file_field = field_name

sql_joined_field

Joined/payload field fetch query. Multi-value, optional, the default is an empty list of queries.

sql_joined_field lets you use two different features: joined fields and payloads (payload fields). Its syntax is as follows:

sql_joined_field = FIELD-NAME 'from'  ( 'query' | 'payload-query' | 'ranged-query' | 'ranged-main-query' ); \
        QUERY [ ; RANGE-QUERY ]

where

Joined fields let you avoid JOIN and/or GROUP_CONCAT statements in the main document fetch query (sql_query). This can be useful when the SQL-side JOIN is slow, or needs to be offloaded on the Manticore side, or simply to emulate MySQL-specific GROUP_CONCAT functionality in case your database server does not support it.

The query must return exactly 2 columns: document ID, and text to append to a joined field. Document IDs can be duplicate, but they must be in ascending order. All the text rows fetched for a given ID will be concatenated together, and the concatenation result will be indexed as the entire contents of a joined field. Rows will be concatenated in the order returned from the query, and separating whitespace will be inserted between them. For instance, if the joined field query returns the following rows:

( 1, 'red' )
( 1, 'right' )
( 1, 'hand' )
( 2, 'mysql' )
( 2, 'manticore' )

then the indexing results would be equivalent to adding a new text field with a value of 'red right hand' to document 1 and 'mysql sphinx' to document 2, including the keyword positions inside the field in the order they come from the query. If the rows need to be in a specific order, that needs to be explicitly defined in the query.

Joined fields are only indexed differently. There are no other differences between joined fields and regular text fields.

Before executing the joined fields query, any set of sql_query_pre_all will be run, if any exist. This allows you to set the desired encoding, etc., within the joined fields' context.

When a single query is not efficient enough or does not work because of the database driver limitations, ranged queries can be used. It works similarly to the ranged queries in the main indexing loop. The range will be queried for and fetched upfront once, then multiple queries with different $start and $end substitutions will be run to fetch the actual data.

When using ranged-main-query query, omit the ranged-query, and it will automatically use the same query from sql_query_range (a useful option in complex inheritance setups to save having to manually duplicate the same query many times).

Payloads let you create a special field in which, instead of keyword positions, so-called user payloads are stored. Payloads are custom integer values attached to every keyword. They can then be used at search time to affect the ranking.

The payload query must return exactly 3 columns:
- document ID
- keyword
- and integer payload value.

Document IDs can be duplicate, but they must be in ascending order. Payloads must be unsigned integers within the 24-bit range, i.e., from 0 to 16777215.

The only ranker that accounts for payloads is proximity_bm25 (the default ranker). On tables with payload fields, it will automatically switch to a variant that matches keywords in those fields, computes a sum of matched payloads multiplied by field weights, and adds that sum to the final rank.

Please note that the payload field is ignored for full-text queries containing complex operators. It only works for simple bag-of-words queries.

Configuration example:

Configuration file
source min {
    type = mysql
    sql_host = localhost
    sql_user = test
    sql_pass =
    sql_db = test
    sql_query = select 1, 'Nike bag' f \
    UNION select 2, 'Adidas bag' f \
    UNION select 3, 'Reebok bag' f \
    UNION select 4, 'Nike belt' f

    sql_joined_field = tag from payload-query; select 1 id, 'nike' tag, 10 weight \
    UNION select 4 id, 'nike' tag, 10 weight;
}

index idx {
    path = idx
    source = min
}
Just SELECT
mysql> select * from idx;
+------+------------+------+
| id   | f          | tag  |
+------+------------+------+
|    1 | Nike bag   | nike |
|    2 | Adidas bag |      |
|    3 | Reebok bag |      |
|    4 | Nike belt  | nike |
+------+------------+------+
4 rows in set (0.00 sec)
Full-text search
Note that when you search for `nike | adidas`, the results containing "nike" receive a higher weight due to the "nike" tag and its weight originating from the payload query.
mysql> select *, weight() from idx where match('nike|adidas');
+------+------------+------+----------+
| id   | f          | tag  | weight() |
+------+------------+------+----------+
|    1 | Nike bag   | nike |    11539 |
|    4 | Nike belt  | nike |    11539 |
|    2 | Adidas bag |      |     1597 |
+------+------------+------+----------+
3 rows in set (0.01 sec)
Complex full-text search
Note that the special payload field is ignored for full-text queries containing complex operators. It only works for simple bag-of-words queries.
mysql> select *, weight() from idx where match('"nike bag"|"adidas bag"');
+------+------------+------+----------+
| id   | f          | tag  | weight() |
+------+------------+------+----------+
|    2 | Adidas bag |      |     2565 |
|    1 | Nike bag   | nike |     2507 |
+------+------------+------+----------+
2 rows in set (0.00 sec)

sql_column_buffers

sql_column_buffers = <colname>=<size>[K|M] [, ...]

Per-column buffer sizes. Optional, default is empty (deduce the sizes automatically). Applies to odbc, mssql source types only.

ODBC and MS SQL drivers sometimes cannot return the maximum actual column size to be expected. For instance,NVARCHAR(MAX) columns always report their length as 2147483647 bytes to indexer even though the actually used length is likely considerably less. However, the receiving buffers still need to be allocated upfront, and their sizes have to be determined. When the driver does not report the column length at all, Manticore allocates default 1 KB buffers for each non-char column, and 1 MB buffers for each char column. Driver-reported column length also gets clamped by an upper limit of 8 MB, so in case the driver reports (almost) a 2 GB column length, it will be clamped and an 8 MB buffer will be allocated instead for that column. These hard-coded limits can be overridden using the sql_column_buffers directive, either in order to save memory on actually shorter columns or to overcome the 8 MB limit on actually longer columns. The directive values must be a comma-separated list of selected column names and sizes:

Example:

sql_query = SELECT id, mytitle, mycontent FROM documents
sql_column_buffers = mytitle=64K, mycontent=10M
¶ 12.2.2.5

Ranged queries

Main query, which needs to fetch all the documents, can impose a read lock on the whole table and stall the concurrent queries (e.g. INSERTs to MyISAM table), waste a lot of memory for result set, etc. To avoid this, Manticore supports so-called ranged queries. With ranged queries, Manticore first fetches min and max document IDs from the table, and then substitutes different ID intervals into main query text and runs the modified query to fetch another chunk of documents. Here's an example.

Ranged query usage example:

sql_query_range = SELECT MIN(id),MAX(id) FROM documents
sql_range_step = 1000
sql_query = SELECT * FROM documents WHERE id>=$start AND id<=$end

If the table contains document IDs from 1 to, say, 2345, then sql_query would be run three times:

  1. with $start replaced with 1 and $end replaced with 1000;
  2. with $start replaced with 1001 and $end replaced with 2000;
  3. with $start replaced with 2001 and $end replaced with 2345.

Obviously, that's not much of a difference for 2000-row table, but when it comes to indexing 10-million-row table, ranged queries might be of some help.

sql_query_range

Defines the range query. The query specified in this option must fetch min and max document IDs that will be used as range boundaries. It must return exactly two integer fields, min ID first and max ID second; the field names are ignored. When enabled, sql_query will be required to contain $start and $end macros. Note that the intervals specified by $start..$end will not overlap, so you should not remove document IDs that are exactly equal to $start or $end from your query.

sql_range_step

This directive defines the range query step. The default value is 1024.

sql_ranged_throttle

This directive can be used to throttle the ranged query. By default, there is no throttling. Values for sql_ranged_throttle should be specified in milliseconds.

Throttling can be useful when the indexer imposes too much load on the database server. It causes the indexer to sleep for a given amount of time once per each ranged query step. This sleep is unconditional and is performed before the fetch query.

sql_ranged_throttle = 1000 # sleep for 1 sec before each query step
¶ 12.2.3

Fetching from XML streams

The xmlpipe2 source type allows for passing custom full-text and attribute data to Manticore in a custom XML format, with the schema (i.e., set of fields and attributes) specified in either the XML stream itself or in the source settings.

Declaration of XML stream

To declare the XML stream, the xmlpipe_command directive is mandatory and contains the shell command that produces the XML stream to be indexed. This can be a file, but it can also be a program that generates XML content on-the-fly.

XML file format

When indexing an xmlpipe2 source, the indexer runs the specified command, opens a pipe to its stdout, and expects a well-formed XML stream.

Here's an example of what the XML stream data might look like:

<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>

<sphinx:schema>
<sphinx:field name="subject"/>
<sphinx:field name="content"/>
<sphinx:attr name="published" type="timestamp"/>
<sphinx:attr name="author_id" type="int" bits="16" default="1"/>
</sphinx:schema>

<sphinx:document id="1234">
<content>this is the main content <![CDATA[and this <cdata> entry
must be handled properly by xml parser lib]]></content>
<published>1012325463</published>
<subject>note how field/attr tags can be
in <b> class="red">randomized</b> order</subject>
<misc>some undeclared element</misc>
</sphinx:document>

<sphinx:document id="1235">
<subject>another subject</subject>
<content>here comes another document, and i am given to understand,
that in-document field order must not matter, sir</content>
<published>1012325467</published>
</sphinx:document>

<!-- ... even more sphinx:document entries here ... -->

<sphinx:killlist>
<id>1234</id>
<id>4567</id>
</sphinx:killlist>

</sphinx:docset>

Arbitrary fields and attributes are allowed. They can also occur in the stream in arbitrary order within each document; the order is ignored. There is a restriction on the maximum field length; fields longer than 2 MB will be truncated to 2 MB (this limit can be changed in the source).

The schema, i.e., complete fields and attributes list, must be declared before any document can be parsed. This can be done either in the configuration file by using xmlpipe_field and xmlpipe_attr_XXX settings, or right in the stream using <sphinx:schema> element. <sphinx:schema> is optional. It is only allowed to occur as the very first sub-element in <sphinx:docset>. If there is no in-stream schema definition, settings from the configuration file will be used. Otherwise, stream settings take precedence. Note that the document id should be specified as a property id of tag <sphinx:document> (e.g. <sphinx:document id="1235">) and is supposed to be a unique-signed positive non-zero 64-bit integer.

Unknown tags (which were not declared neither as fields nor as attributes) will be ignored with a warning. In the example above, <misc> will be ignored. All embedded tags and their attributes (such as <strong> in <subject> in the example above) will be silently ignored.

Support for incoming stream encodings depends on whether iconv is installed on the system. xmlpipe2 is parsed using the libexpat parser, which understands US-ASCII, ISO-8859-1, UTF-8, and a few UTF-16 variants natively. Manticore's configure script will also check for libiconv presence and utilize it to handle other encodings. libexpat also enforces the requirement to use the UTF-8 charset on the Manticore side because the parsed data it returns is always in UTF-8.

XML elements (tags) recognized by xmlpipe2 (and their attributes where applicable) are:

Data definition in source configuration

If the XML doesn't define a schema, the data types of tables elements must be defined in the source configuration.

Specific XML source settings

If xmlpipe_fixup_utf8 is set it will enable Manticore-side UTF-8 validation and filtering to prevent XML parser from choking on non-UTF-8 documents. By default, this option is disabled.

Under certain occasions it might be hard or even impossible to guarantee that the incoming XMLpipe2 document bodies are in perfectly valid and conforming UTF-8 encoding. For instance, documents with national single-byte encodings could sneak into the stream. libexpat XML parser is fragile, meaning that it will stop processing in such cases. UTF8 fixup feature lets you avoid that. When fixup is enabled, Manticore will preprocess the incoming stream before passing it to the XML parser and replace invalid UTF-8 sequences with spaces.

xmlpipe_fixup_utf8 = 1

Example of XML source without schema in configuration:

source xml_test_1
{
    type = xmlpipe2
    xmlpipe_command = cat /tmp/products_today.xml
}

Example of XML source with schema in configuration:

source xml_test_2
{
    type = xmlpipe2
    xmlpipe_command = cat /tmp/products_today.xml
    xmlpipe_field = subject
    xmlpipe_field = content
    xmlpipe_attr_timestamp = published
    xmlpipe_attr_uint = author_id:16
}
¶ 12.2.4

Fetching from TSV,CSV

TSV/CSV is the simplest way to pass data to the Manticore indexer. This method was created due to the limitations of xmlpipe2. In xmlpipe2, the indexer must map each attribute and field tag in the XML file to a corresponding schema element. This mapping requires time, and it increases with the number of fields and attributes in the schema. TSV/CSV has no such issue, as each field and attribute corresponds to a particular column in the TSV/CSV file. In some cases, TSV/CSV could work slightly faster than xmlpipe2.

File format

The first column in TSV/CSV file must be a document ID. The rest columns must mirror the declaration of fields and attributes in the schema definition. Note that you don't need to declare the document ID in the schema, since it's always considered to be present, should be in the 1st column and needs to be a unique-signed positive non-zero 64-bit integer.

The difference between tsvpipe and csvpipe is delimiter and quoting rules. tsvpipe has a tab character as hardcoded delimiter and has no quoting rules. csvpipe has the csvpipe_delimiteroption for delimiter with a default value of , and also has quoting rules, such as:

Declaration of TSV stream

tsvpipe_command directive is mandatory and contains the shell command invoked to produce the TSV stream that gets indexed. The command can read a TSV file, but it can also be a program that generates on-the-fly the tab delimited content.

TSV indexed columns

The following directives can be used to declare the types of the indexed columns:

Example of a source using a TSV file:

source tsv_test
{
    type = tsvpipe
    tsvpipe_command = cat /tmp/rock_bands.tsv
    tsvpipe_field = name
    tsvpipe_attr_multi = genre_tags
}
1   Led Zeppelin    35,23,16
2   Deep Purple 35,92
3   Frank Zappa 35,23,16,92,33,24

Declaration of CSV stream

csvpipe_command directive is mandatory and contains the shell command invoked to produce the CSV stream which gets indexed. The command can just read a CSV file but it can also be a program that generates on-the-fly the comma delimited content.

CSV indexed columns

The following directives can be used to declare the types of the indexed columns:

Example of a source using a CSV file:

source csv_test
{
    type = csvpipe
    csvpipe_command = cat /tmp/rock_bands.csv
    csvpipe_field = name
    csvpipe_attr_multi = genre_tags
}
1,"Led Zeppelin","35,23,16"
2,"Deep Purple","35,92"
3,"Frank Zappa","35,23,16,92,33,24"
¶ 12.2.5

Main+delta schema

In many situations, the total dataset is too large to be frequently rebuilt from scratch, while the number of new records remains relatively small. For example, a forum may have 1,000,000 archived posts but only receive 1,000 new posts per day.

In such cases, implementing "live" (nearly real-time) table updates can be achieved using a "main+delta" scheme.

The concept involves setting up two sources and two tables, with one "main" table for data that rarely changes (if ever), and one "delta" table for new documents. In the example, the 1,000,000 archived posts would be stored in the main table, while the 1,000 new daily posts would be placed in the delta table. The delta table can then be rebuilt frequently, making the documents available for searching within seconds or minutes. Determining which documents belong to which table and rebuilding the main table can be fully automated. One approach is to create a counter table that tracks the ID used to split the documents and update it whenever the main table is rebuilt.

Using a timestamp column as the split variable is more effective than using the ID since timestamps can track not only new documents but also modified ones.

For datasets that may contain modified or deleted documents, the delta table should provide a list of affected documents, ensuring they are suppressed and excluded from search queries. This is accomplished using a feature called Kill Lists. The document IDs to be killed can be specified in an auxiliary query defined by sql_query_killlist. The delta table must indicate the target tables for which the kill lists will be applied using the killlist_target directive. The impact of kill lists is permanent on the target table, meaning that even if a search is performed without the delta table, the suppressed documents will not appear in the search results.

Notice how we're overriding sql_query_pre in the delta source. We must explicitly include this override. If we don't, the REPLACE query would be executed during the delta source's build as well, effectively rendering it useless.

Example
# in MySQL
CREATE TABLE deltabreaker (
  index_name VARCHAR(50) NOT NULL,
  created_at TIMESTAMP NOT NULL  DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (index_name)
);

# in manticore.conf
source main {
  ...
  sql_query_pre = REPLACE INTO deltabreaker SET index_name = 'main', created_at = NOW()
  sql_query =  SELECT id, title, UNIX_TIMESTAMP(updated_at) AS updated FROM documents WHERE deleted=0 AND  updated_at  >=FROM_UNIXTIME($start) AND updated_at  <=FROM_UNIXTIME($end)
  sql_query_range  = SELECT ( SELECT UNIX_TIMESTAMP(MIN(updated_at)) FROM documents) min, ( SELECT UNIX_TIMESTAMP(created_at)-1 FROM deltabreaker WHERE index_name='main') max
  sql_query_post_index = REPLACE INTO deltabreaker set index_name = 'delta', created_at = (SELECT created_at FROM deltabreaker t WHERE index_name='main')
  ...
  sql_attr_timestamp = updated
}

source delta : main {
  sql_query_pre =
  sql_query_range = SELECT ( SELECT UNIX_TIMESTAMP(created_at) FROM deltabreaker WHERE index_name='delta') min, UNIX_TIMESTAMP() max
  sql_query_killlist = SELECT id FROM documents WHERE updated_at >=  (SELECT created_at FROM deltabreaker WHERE index_name='delta')
}

table main {
  path = /var/lib/manticore/main
  source = main
}

table delta {
  path = /var/lib/manticore/delta
  source = delta
  killlist_target = main:kl
}
¶ 12.2.6

Adding data from tables

¶ 12.2.6.1

Merging tables

Merging two existing plain tables can be more efficient than indexing the data from scratch and might be desired in some cases (such as merging 'main' and 'delta' tables instead of simply rebuilding 'main' in the 'main+delta' partitioning scheme). Thus,indexer provides an option to do that. Merging tables is typically faster than rebuilding, but still not instant for huge tables. Essentially, it needs to read the contents of both tables once and write the result once. Merging a 100 GB and 1 GB table, for example, will result in 202 GB of I/O (but that's still likely less than indexing from scratch requires).

The basic command syntax is as follows:

sudo -u manticore indexer --merge DSTINDEX SRCINDEX [--rotate] [--drop-src]

Unless --drop-src is specified, only the DSTINDEX table will be affected: the contents of SRCINDEX will be merged into it.

The --rotate switch is required if DSTINDEX is already being served by searchd.

The typical usage pattern is to merge a smaller update from SRCINDEX into DSTINDEX. Thus, when merging attributes, the values from SRCINDEX will take precedence if duplicate document IDs are encountered. However, note that the "old" keywords will not be automatically removed in such cases. For example, if there's a keyword "old" associated with document 123 in DSTINDEX, and a keyword "new" associated with it in SRCINDEX, document 123 will be found by both keywords after the merge. You can supply an explicit condition to remove documents from DSTINDEX to mitigate this; the relevant switch is --merge-dst-range:

sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0

This switch allows you to apply filters to the destination table along with merging. There can be several filters; all of their conditions must be met in order to include the document in the resulting merged table. In the example above, the filter passes only those records where 'deleted' is 0, eliminating all records that were flagged as deleted.

--drop-src enables dropping SRCINDEX after the merge and before rotating the tables, which is important if you specify DSTINDEX in killlist_target of DSTINDEX. Otherwise, when rotating the tables, the documents that have been merged into DSTINDEX may be suppressed by SRCINDEX.

¶ 12.2.6.2

Killlist in plain tables

When using plain tables, there's a challenge arising from the need to have the data in the table as fresh as possible.

In this case, one or more secondary (also known as delta) tables are used to capture the modified data between the time the main table was created and the current time. The modified data can include new, updated, or deleted documents. The search becomes a search over the main table and the delta table. This works seamlessly when you just add new documents to the delta table, but when it comes to updated or deleted documents, there remains the following issue.

If a document is present in both the main and delta tables, it can cause issues during searching, as the engine will see two versions of a document and won't know how to pick the right one. So, the delta needs to somehow inform the search that there are deleted documents in the main table that should be disregarded. This is where kill lists come in.

Table kill-list

A table can maintain a list of document IDs that can be used to suppress records in other tables. This feature is available for plain tables using database sources or plain tables using XML sources. In the case of database sources, the source needs to provide an additional query defined by sql_query_killlist. It will store in the table a list of documents that can be used by the server to remove documents from other plain tables.

This query is expected to return a number of 1-column rows, each containing just the document ID.

In many cases, the query is a union between a query that retrieves a list of updated documents and a list of deleted documents, e.g.:

sql_query_killlist = \
    SELECT id FROM documents WHERE updated_ts>=@last_reindex UNION \
    SELECT id FROM documents_deleted WHERE deleted_ts>=@last_reindex

Removing documents in a plain table

A plain table can contain a directive called killlist_target that will tell the server it can provide a list of document IDs that should be removed from certain existing tables. The table can use either its document IDs as the source for this list or provide a separate list.

killlist_target

Sets the table(s) that the kill-list will be applied to. Optional, default value is empty.

When you use plain tables you often need to maintain not just a single table, but a set of them to be able to add/update/delete new documents sooner (read about delta table updates). n order to suppress matches in the previous (main) table that were updated or deleted in the next (delta) table, you need to:

  1. Create a kill-list in the delta table using sql_query_killlist
  2. Specify main table as killlist_target in delta table settings:
CONFIG
table products {
  killlist_target = main:kl

  path = products
  source = src_base
}

When killlist_target is specified, the kill-list is applied to all the tables listed in it on searchd startup. If any of the tables from killlist_target are rotated, the kill-list is reapplied to these tables. When the kill-list is applied, tables that were affected save these changes to disk.

killlist_target has 3 modes of operation:

  1. killlist_target = main:kl. Document IDs from the kill-list of the delta table are suppressed in the main table (see sql_query_killlist).
  2. killlist_target = main:id. All document IDs from the delta table are suppressed in the main table. The kill-list is ignored.
  3. killlist_target = main. Both document IDs from the delta table and its kill-list are suppressed in the main table.

Multiple targets can be specified, separated by commas like:

killlist_target = table_one:kl,table_two:kl

You can change the killlist_target settings for a table without rebuilding it by using ALTER.

However, since the 'old' main table has already written the changes to disk, the documents that were deleted in it will remain deleted even if it is no longer in the killlist_target of the delta table.

SQL
ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'
HTTP
POST /cli -d "
ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'"
¶ 12.2.6.3

Attaching a plain table to a real-time table

A plain table can be converted into a real-time table or added to an existing real-time table.

The first case is useful when you need to regenerate a real-time table completely, which may be needed, for example, if tokenization settings need an update. In this situation, preparing a plain table and converting it into a real-time table may be easier than preparing a batch job to perform INSERTs for adding all the data into a real-time table.

In the second case, you normally want to add a large bulk of new data to a real-time table, and again, creating a plain table with that data is easier than populating the existing real-time table.

The ATTACH statement allows you to convert a plain table to be attached to an existing real-time table. It also enables you to attach the content of one real-time table to another real-time table.

ATTACH TABLE plain_or_rt_table TO TABLE rt_table [WITH TRUNCATE]

ATTACH TABLE statement lets you move data from a plain table or a RT table to an RT table.

After a successful ATTACH the data originally stored in the source plain table becomes a part of the target RT table, and the source plain table becomes unavailable (until the next rebuild). If the source table is an RT table, its content is moved into the destination RT table, and the source RT table remains empty. ATTACH does not result in any table data changes. Essentially, it just renames the files (making the source table a new disk chunk of the target RT table) and updates the metadata. So it is generally a quick operation that might (frequently) complete as fast as under a second.

Note that when a table is attached to an empty RT table, the fields, attributes, and text processing settings (tokenizer, wordforms, etc.) from the source table are copied over and take effect. The respective parts of the RT table definition from the configuration file will be ignored.

When the TRUNCATE option is used, the RT table gets truncated prior to attaching the source plain table. This allows the operation to be atomic or ensures that the attached source plain table will be the only data in the target RT table.

ATTACH TABLE comes with a number of restrictions. Most notably, the target RT table is currently required to be either empty or have the same settings as the source table. In case the source table gets attached to a non-empty RT table, the RT table data collected so far gets stored as a regular disk chunk, and the table being attached becomes the newest disk chunk, with documents having the same IDs getting killed. The complete list of restrictions is as follows:

Example
Before the ATTACH, the RT table is empty and has 3 fields:
mysql> DESC rt;
Empty set (0.00 sec)

mysql> SELECT * FROM rt;
+-----------+---------+
| Field     | Type    |
+-----------+---------+
| id        | integer |
| testfield | field   |
| testattr  | uint    |
+-----------+---------+
3 rows in set (0.00 sec)
The plain table is not empty:
mysql> SELECT * FROM plain WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)
Attaching the plain table to the RT table:
mysql> ATTACH TABLE plain TO TABLE rt;
Query OK, 0 rows affected (0.00 sec)
The RT table now has 5 fields:
mysql> DESC rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | integer   |
| title      | field     |
| content    | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+
5 rows in set (0.00 sec)
And it's not empty:
mysql> SELECT * FROM rt WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)
After the ATTACH, the plain table is removed and no longer available for searching:
mysql> SELECT * FROM plain WHERE MATCH('test');
ERROR 1064 (42000): no enabled local indexes to search
¶ 12.2.6.4

Importing table

If you decide to migrate from Plain mode to RT mode or in some other cases, real-time and percolate tables built in Plain mode can be imported to Manticore running in RT mode using the IMPORT TABLE statement. The general syntax is as follows:

IMPORT TABLE table_name FROM 'path'

where the 'path' parameter must be set as: /your_backup_folder/your_backup_name/data/your_table_name/your_table_name

Req
mysql -P9306 -h0 -e 'create table t(f text)'

mysql -P9306 -h0 -e "backup table t to /tmp/"

mysql -P9306 -h0 -e "drop table t"

BACKUP_NAME=$(ls /tmp | grep 'backup-' | tail -n 1)

mysql -P9306 -h0 -e "import table t from '/tmp/$BACKUP_NAME/data/t/t'

mysql -P9306 -h0 -e "show tables"

Executing this command makes all the table files of the specified table copied to data_dir. All the external table files such as wordforms, exceptions and stopwords are also copied to the same data_dir.
IMPORT TABLE has the following limitations:

indexer --print-rt

If the above method for migrating a plain table to an RT table is not possible, you may use indexer --print-rt to dump data from a plain table directly without the need to convert it to an RT type table and then import the dump into an RT table right from the command line.

This method has a few limitations though:

Req
/usr/bin/indexer --rotate --config /etc/manticoresearch/manticore.conf --print-rt my_rt_index my_plain_index > /tmp/dump_regular.sql

mysql -P $9306 -h0 -e "truncate table my_rt_index"

mysql -P 9306 -h0 < /tmp/dump_regular.sql

rm /tmp/dump_regular.sql
¶ 12.2.7

Rotating a table

Table rotation is a procedure in which the searchd server looks for new versions of defined tables in the configuration. Rotation is supported only in Plain mode of operation.

There can be two cases:

In the first case, the indexer cannot put the new version of the table online as the running copy is locked and loaded by searchd. In this case indexer needs to be called with the --rotate parameter. If rotate is used, indexer creates new table files with .new. in their names and sends a HUP signal to searchd informing it about the new version. The searchd will perform a lookup and will put the new version of the table in place and discard the old one. In some cases, it might be desired to create the new version of the table but not perform rotate as soon as possible. For example, it might be desired to first check the health of the new table versions. In this case, indexer can accept the--nohup parameter which will forbid sending the HUP signal to the server.

New tables can be loaded by rotation; however, the regular handling of the HUP signal is to check for new tables only if the configuration has changed since server startup. If the table was already defined in the configuration, the table should be first created by running indexer without rotation and perform the RELOAD TABLES statement instead.

There are also two specialized statements that can be used to perform rotations on tables:

RELOAD TABLE

RELOAD TABLE tbl [ FROM '/path/to/table_files' [ OPTION switchover=1 ] ];

The RELOAD TABLE command enables table rotation via SQL.

This command functions in three modes. In the first mode, without specifying a path, the Manticore server checks for new table files in the directory indicated by the path. New table files must be named as tbl.new.sp?.

If you specify a path, the server searches for table files in that directory, relocates them to the table path, renames them from tbl.sp? to tbl.new.sp?, and rotates them.

The third mode, activated by OPTION switchover=1, switches the index to the new path. Here, the daemon tries to load the table directly from the new path without moving the files. If loading is successful, this new index supersedes the old one.

Also, the daemon writes a unique link file (tbl.link) in the directory specified by path, maintaining persistent redirection.

If you revert a redirected index to the path specified in the configuration, the daemon will detect this and delete the link file.

Once redirected, the daemon retrieves the table from the newly linked path. When rotating, it looks for new table versions at the newly redirected path. Bear in mind, the daemon checks the configuration for common errors, like duplicate paths across different tables. However, it won't identify if multiple tables point to the same path via redirection. Under normal operations, tables are locked with the .spl file, but disabling the lock may cause problems. If there's an error (e.g., the path is inaccessible for any reason), you should manually correct (or simply delete) the link file.

indextool follows the link file, but other tools (indexer, index_converter, etc.) do not recognize the link file and consistently use the path defined in the configuration file, ignoring any redirection. Thus, you can inspect the index with indextool, and it will read from the new location. However, more complex operations like merging will not acknowledge any link file.

mysql> RELOAD TABLE plain_table;
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new_table_files';
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new/place/for/table/table_files' OPTION switchover=1;

RELOAD TABLES

RELOAD TABLES;

This command functions similarly to the HUP system signal, triggering a rotation of tables. Nevertheless, it doesn't exactly mirror the typical HUP signal (which can come from a kill -HUP command or indexer --rotate). This command actively searches for any tables needing rotation and is capable of re-reading the configuration. Suppose you launch Manticore in plain mode with a config file that points to a nonexistent plain table. If you then attempt to indexer --rotate the table, the new table won't be recognized by the server until you execute RELOAD TABLES or restart the server.

Depending on the value of the seamless_rotate setting, new queries might be shortly stalled, and clients will receive temporary errors.

mysql> RELOAD TABLES;
Query OK, 0 rows affected (0.01 sec)

Seamless rotate

The rotate assumes old table version is discarded and new table version is loaded and replaces the existing one. During this swapping, the server needs to also serve incoming queries made on the table that is going to be updated. To avoid stalls of the queries, the server implements a seamless rotate of the table by default, as described below.

Tables may contain data that needs to be precached in RAM. At the moment, .spa, .spb, .spi and .spm files are fully precached (they contain attribute data, blob attribute data, keyword table, and killed row map, respectively). Without seamless rotate, rotating a table tries to use as little RAM as possible and works as follows:

  1. New queries are temporarily rejected (with "retry" error code).
  2. searchd waits for all currently running queries to finish.
  3. Old table is deallocated and its files are renamed.
  4. New table files are renamed and required RAM is allocated.
  5. New table attribute and dictionary data is preloaded to RAM.
  6. searchd resumes serving queries from the new table.

However, if there's a lot of attribute or dictionary data, then the preloading step could take noticeable time - up to several minutes in case of preloading 1-5+ GB files.

With seamless rotate enabled, rotation works as follows:

  1. New table RAM storage is allocated.
  2. New table attribute and dictionary data is asynchronously preloaded to RAM.
  3. On success, old table is deallocated and both tables' files are renamed.
  4. On failure, new table is deallocated.
  5. At any given moment, queries are served either from old or new table copy.

Seamless rotate comes at the cost of higher peak memory usage during the rotation (because both old and new copies of .spa/.spb/.spi/.spm data need to be in RAM while preloading the new copy). However, average usage stays the same.

Example:

seamless_rotate = 1
¶ 12.3

Updating documents

¶ 12.3.1

REPLACE vs UPDATE

You can modify existing data in an RT or PQ table by either updating or replacing it.

UPDATE replaces row-wise attribute values of existing documents with new values. Full-text fields and columnar attributes cannot be updated. If you need to change the content of a full-text field or columnar attributes, use REPLACE.

REPLACE works similarly to INSERT except that if an old document has the same ID as the new document, the old document is marked as deleted before the new document is inserted. Note that the old document does not get physically deleted from the table. The deletion can only happen when chunks are merged in a table, e.g., as a result of an OPTIMIZE.

¶ 12.3.2

REPLACE

REPLACE works similarly to INSERT, but it marks the previous document with the same ID as deleted before inserting a new one.

If you are looking for in-place updates, please see this section.

SQL REPLACE

The syntax of the SQL REPLACE statement is as follows:

To replace the whole document:

REPLACE INTO table [(column1, column2, ...)]
    VALUES (value1, value2, ...)
    [, (...)]

To replace only selected fields:

REPLACE INTO table
    SET field1=value1[, ..., fieldN=valueN]
    WHERE id = <id>

Note, you can filter only by id in this mode.

See the examples for more details.

JSON REPLACE

POST /<table name>/_update/<id>
{
  "<field1>": <value1>,
  ...
  "<fieldN>": <valueN>
}

See the examples for more details.

SQL
REPLACE INTO products VALUES(1, "document one", 10);
Query OK, 1 row affected (0.00 sec)
REPLACE ... SET
REPLACE INTO products SET description='HUAWEI Matebook 15', price=10 WHERE id = 55;
Query OK, 1 row affected (0.00 sec)
JSON
POST /replace
-H "Content-Type: application/x-ndjson" -d '
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title":"product one",
    "price":10
  }
}
'
{
  "_index":"products",
  "_id":1,
  "created":false,
  "result":"updated",
  "status":200
}
Elasticsearch-like
PUT /products/_doc/2
{
  "title": "product two",
  "price": 20
}

POST /products/_doc/3
{
  "title": "product three",
  "price": 10
}
{
"_id":2,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}

{
"_id":3,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
Elasticsearch-like partial
POST /products/_update/55
{
  "doc": {
    "description": "HUAWEI Matebook 15",
    "price": 10
  }
}
{
"_index":"products",
"updated":1
}
PHP
$index->replaceDocument([
   'title' => 'document one',
    'price' => 10
],1);
Array(
    [_index] => products
    [_id] => 1
    [created] => false
    [result] => updated
    [status] => 200
)
Python
indexApi.replace({"index" : "products", "id" : 1, "doc" : {"title" : "document one","price":10}})
{'created': False,
 'found': None,
 'id': 1,
 'index': 'products',
 'result': 'updated'}
javascript
res = await indexApi.replace({"index" : "products", "id" : 1, "doc" : {"title" : "document one","price":10}});
{"_index":"products","_id":1,"result":"updated"}
Java
docRequest = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
            put("title","document one");
            put("price",10);
}};
docRequest.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.replace(docRequest);
class SuccessResponse {
    index: products
    id: 1
    created: false
    result: updated
    found: null
}
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "document one");
doc.Add("price", 10);
InsertDocumentRequest docRequest = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.replace(docRequest);
class SuccessResponse {
    index: products
    id: 1
    created: false
    result: updated
    found: null
}
TypeScript
res = await indexApi.replace({
  index: 'test',
  id: 1,
  doc: { content: 'Text 11', name: 'Doc 11', cat: 3 },
});
{
    "_index":"test",
    "_id":1,
    "created":false
    "result":"updated"
    "status":200
}
Go
replaceDoc := map[string]interface{} {"content": "Text 11", "name": "Doc 11", "cat": 3}
replaceRequest := manticoreclient.NewInsertDocumentRequest("test", replaceDoc)
replaceRequest.SetId(1)
res, _, _ := apiClient.IndexAPI.Replace(context.Background()).InsertDocumentRequest(*replaceRequest).Execute()
{
    "_index":"test",
    "_id":1,
    "created":false
    "result":"updated"
    "status":200
}

REPLACE is available for real-time and percolate tables. You can't replace data in a plain table.

When you run a REPLACE, the previous document is not removed, but it's marked as deleted, so the table size grows until chunk merging happens. To force a chunk merge, use the OPTIMIZE statement.

Bulk replace

You can replace multiple documents at once. Check bulk adding documents for more information.

SQL
REPLACE INTO products(id,title,tag) VALUES (1, 'doc one', 10), (2,' doc two', 20);
Query OK, 2 rows affected (0.00 sec)
JSON
POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{ "replace" : { "index" : "products", "id":1, "doc": { "title": "doc one", "tag" : 10 } } }
{ "replace" : { "index" : "products", "id":2, "doc": { "title": "doc two", "tag" : 20 } } }
'
{
  "items":
  [
    {
      "replace":
      {
        "_index":"products",
        "_id":1,
        "created":false,
        "result":"updated",
        "status":200
      }
    },
    {
      "replace":
      {
        "_index":"products",
        "_id":2,
        "created":false,
        "result":"updated",
        "status":200
      }
    }
  ],
  "errors":false
}
PHP
$index->replaceDocuments([
    [   
        'id' => 1,
        'title' => 'document one',
        'tag' => 10
    ],
    [   
        'id' => 2,
        'title' => 'document one',
        'tag' => 20
    ]
);
Array(
    [items] =>
    Array(
        Array(
            [_index] => products
            [_id] => 2
            [created] => false
            [result] => updated
            [status] => 200
        )
        Array(
            [_index] => products
            [_id] => 2
            [created] => false
            [result] => updated
            [status] => 200
        )
    )
    [errors => false
)
Python
indexApi = manticoresearch.IndexApi(client)
docs = [ \
    {"replace": {"index" : "products", "id" : 1, "doc" : {"title" : "document one"}}}, \
    {"replace": {"index" : "products", "id" : 2, "doc" : {"title" : "document two"}}} ]
api_resp = indexApi.bulk('\n'.join(map(json.dumps,docs)))
{'error': None,
 'items': [{u'replace': {u'_id': 1,
                         u'_index': u'products',
                         u'created': False,
                         u'result': u'updated',
                         u'status': 200}},
           {u'replace': {u'_id': 2,
                         u'_index': u'products',
                         u'created': False,
                         u'result': u'updated',
                         u'status': 200}}]}
javascript
docs = [
    {"replace": {"index" : "products", "id" : 1, "doc" : {"title" : "document one"}}},
    {"replace": {"index" : "products", "id" : 2, "doc" : {"title" : "document two"}}} ];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
{"items":[{"replace":{"_index":"products","_id":1,"created":false,"result":"updated","status":200}},{"replace":{"_index":"products","_id":2,"created":false,"result":"updated","status":200}}],"errors":false}
Java
body = "{\"replace\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"document one\"}}}" +"\n"+
    "{\"replace\": {\"index\" : \"products\", \"id\" : 2, \"doc\" : {\"title\" : \"document two\"}}}"+"\n" ;         
indexApi.bulk(body);
class BulkResponse {
    items: [{replace={_index=products, _id=1, created=false, result=updated, status=200}}, {replace={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}
C#
string body = "{\"replace\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"document one\"}}}" +"\n"+
    "{\"replace\": {\"index\" : \"products\", \"id\" : 2, \"doc\" : {\"title\" : \"document two\"}}}"+"\n" ;         
indexApi.Bulk(body);
class BulkResponse {
    items: [{replace={_index=products, _id=1, created=false, result=updated, status=200}}, {replace={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}
TypeScript
replaceDocs = [
  {
    replace: {
      index: 'test',
      id: 1,
      doc: { content: 'Text 11', cat: 1, name: 'Doc 11' },
    },
  },
  {
    replace: {
      index: 'test',
      id: 2,
      doc: { content: 'Text 22', cat: 9, name: 'Doc 22' },
    },
  },
];

res = await indexApi.bulk(
  replaceDocs.map((e) => JSON.stringify(e)).join("\n")
);
{
  "items":
  [
    {
      "replace":
      {
        "_index":"test",
        "_id":1,
        "created":false,
        "result":"updated",
        "status":200
      }
    },
    {
      "replace":
      {
        "_index":"test",
        "_id":2,
        "created":false,
        "result":"updated",
        "status":200
      }
    }
  ],
  "errors":false
}
Go
body := "{\"replace\": {\"index\": \"test\", \"id\": 1, \"doc\": {\"content\": \"Text 11\", \"name\": \"Doc 11\", \"cat\": 1 }}}" + "\n" +
    "{\"replace\": {\"index\": \"test\", \"id\": 2, \"doc\": {\"content\": \"Text 22\", \"name\": \"Doc 22\", \"cat\": 9 }}}" +"\n";
res, _, _ := apiClient.IndexAPI.Bulk(context.Background()).Body(body).Execute()
{
  "items":
  [
    {
      "replace":
      {
        "_index":"test",
        "_id":1,
        "created":false,
        "result":"updated",
        "status":200
      }
    },
    {
      "replace":
      {
        "_index":"test",
        "_id":2,
        "created":false,
        "result":"updated",
        "status":200
      }
    }
  ],
  "errors":false
}
¶ 12.3.3

UPDATE

UPDATE changes row-wise attribute values of existing documents in a specified table with new values. Note that you can't update the contents of a fulltext field or a columnar attribute. If there's such a need, use REPLACE.

Attribute updates are supported for RT, PQ, and plain tables. All attribute types can be updated as long as they are stored in the traditional row-wise storage.

Note that the document ID cannot be updated.

Note that when you update an attribute, its secondary index gets disabled, so consider replacing the document instead.

SQL
UPDATE products SET enabled=0 WHERE id=10;
Query OK, 1 row affected (0.00 sec)
JSON
POST /update

{
  "index":"products",
  "id":10,
  "doc":
  {
    "enabled":0
  }
}
{
  "_index":"products",
  "updated":1
}
PHP
$index->updateDocument([
    'enabled'=>0
],10);
Array(
    [_index] => products
    [_id] => 10
    [result] => updated
)
Python
indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {"price":10}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
javascript
res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {"price":10}});
{"_index":"products","_id":1,"result":"updated"}
Java
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("price",10);
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("price", 10);
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
TypeScript
res = await indexApi.update({ index: "test", id: 1, doc: { cat: 10 } });
{
    "_index":"test",
    "_id":1,
    "result":"updated"
}
Go
updateDoc = map[string]interface{} {"cat":10}
updateRequest = openapiclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

Multiple attributes can be updated in a single statement. Example:

SQL
UPDATE products
SET price=100000000000,
    coeff=3465.23,
    tags1=(3,6,4),
    tags2=()
WHERE MATCH('phone') AND enabled=1;
Query OK, 148 rows affected (0.0 sec)
JSON
POST /update
{
  "index":"products",
  "doc":
  {
    "price":100000000000,
    "coeff":3465.23,
    "tags1":[3,6,4],
    "tags2":[]
  },
  "query":
  {
    "match": { "*": "phone" },
    "equals": { "enabled": 1 }
  }
}
{
  "_index":"products",
  "updated":148
}
PHP
$query= new BoolQuery();
$query->must(new Match('phone','*'));
$query->must(new Equals('enabled',1));
$index->updateDocuments([
    'price' => 100000000000,
    'coeff' => 3465.23,
    'tags1' => [3,6,4],
    'tags2' => []
    ],
    $query
);
Array(
    [_index] => products
    [updated] => 148
)
Python
indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {
    "price": 100000000000,
    "coeff": 3465.23,
    "tags1": [3,6,4],
    "tags2": []}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
javascript
res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {
    "price": 100000000000,
    "coeff": 3465.23,
    "tags1": [3,6,4],
    "tags2": []}});
{"_index":"products","_id":1,"result":"updated"}
Java
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("price",10);
    put("coeff",3465.23);
    put("tags1",new int[]{3,6,4});
    put("tags2",new int[]{});
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("price", 10);
doc.Add("coeff", 3465.23);
doc.Add("tags1", new List<int> {3,6,4});
doc.Add("tags2", new List<int> {});
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
TypeScript
res = await indexApi.update({ index: "test", id: 1, doc: { name: "Doc 21", cat: "10" } });
{
  "_index":"test",
  "_id":1,
  "result":"updated"
}
Go
updateDoc = map[string]interface{} {"name":"Doc 21", "cat":10}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
  "_index":"test",
  "_id":1,
  "result":"updated"
}

When assigning out-of-range values to 32-bit attributes, they will be trimmed to their lower 32 bits without a prompt. For example, if you try to update the 32-bit unsigned int with a value of 4294967297, the value of 1 will actually be stored, because the lower 32 bits of 4294967297 (0x100000001 in hex) amount to 1 (0x00000001 in hex).

UPDATE can be used to perform partial JSON updates on numeric data types or arrays of numeric data types. Just make sure you don't update an integer value with a float value as it will be rounded off.

SQL
insert into products (id, title, meta) values (1,'title','{"tags":[1,2,3]}');

update products set meta.tags[0]=100 where id=1;
Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)
JSON
POST /insert
{
    "index":"products",
    "id":100,
    "doc":
    {
        "title":"title",
        "meta": {
            "tags":[1,2,3]
        }
    }
}

POST /update
{
    "index":"products",
    "id":100,
    "doc":
    {
        "meta.tags[0]":100
    }
}
{
   "_index":"products",
   "_id":100,
   "created":true,
   "result":"created",
   "status":201
}

{
  "_index":"products",
  "updated":1
}
PHP
$index->insertDocument([
    'title' => 'title',
    'meta' => ['tags' => [1,2,3]]
],1);
$index->updateDocument([
    'meta.tags[0]' => 100
],1);
Array(
    [_index] => products
    [_id] => 1
    [created] => true
    [result] => created
)

Array(
    [_index] => products
    [updated] => 1
)
Python
indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {
    "meta.tags[0]": 100}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
javascript
res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {
   "meta.tags[0]": 100}});
{"_index":"products","_id":1,"result":"updated"}
Java
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("meta.tags[0]",100);
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("meta.tags[0]", 100);
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
TypeScript
res = await indexApi.update({"index" : "test", "id" : 1, "doc" : { "meta.tags[0]": 100} });
{"_index":"test","_id":1,"result":"updated"}
Go
updateDoc = map[string]interface{} {"meta.tags[0]":100}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

Updating other data types or changing property type in a JSON attribute requires a full JSON update.

SQL
insert into products values (1,'title','{"tags":[1,2,3]}');

update products set data='{"tags":["one","two","three"]}' where id=1;
Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)
JSON
POST /insert
{
    "index":"products",
    "id":1,
    "doc":
    {
        "title":"title",
        "data":"{\"tags\":[1,2,3]}"
    }
}

POST /update
{
    "index":"products",
    "id":1,
    "doc":
    {
        "data":"{\"tags\":[\"one\",\"two\",\"three\"]}"
    }
}
{
  "_index":"products",
  "updated":1
}
PHP
$index->insertDocument([
    'title'=> 'title',
    'data' => [
         'tags' => [1,2,3]
    ]
],1);

$index->updateDocument([
    'data' => [
            'one', 'two', 'three'
    ]
],1);
Array(
    [_index] => products
    [_id] => 1
    [created] => true
    [result] => created
)

Array(
    [_index] => products
    [updated] => 1
)
Python
indexApi.insert({"index" : "products", "id" : 100, "doc" : {"title" : "title", "meta" : {"tags":[1,2,3]}}})
indexApi.update({"index" : "products", "id" : 100, "doc" : {"meta" : {"tags":['one','two','three']}}})
{'created': True,
 'found': None,
 'id': 100,
 'index': 'products',
 'result': 'created'}
{'id': 100, 'index': 'products', 'result': 'updated', 'updated': None}
javascript
res = await indexApi.insert({"index" : "products", "id" : 100, "doc" : {"title" : "title", "meta" : {"tags":[1,2,3]}}});
res = await indexApi.update({"index" : "products", "id" : 100, "doc" : {"meta" : {"tags":['one','two','three']}}});
{"_index":"products","_id":100,"created":true,"result":"created"}
{"_index":"products","_id":100,"result":"updated"}
Java
InsertDocumentRequest newdoc = new InsertDocumentRequest();
doc = new HashMap<String,Object>(){{
    put("title","title");
    put("meta",
        new HashMap<String,Object>(){{
            put("tags",new int[]{1,2,3});
        }});

}};
newdoc.index("products").id(100L).setDoc(doc);        
indexApi.insert(newdoc);

updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("meta",
        new HashMap<String,Object>(){{
            put("tags",new String[]{"one","two","three"});
        }});
}};
updatedoc.index("products").id(100L).setDoc(doc);
indexApi.update(updatedoc);
class SuccessResponse {
    index: products
    id: 100
    created: true
    result: created
    found: null
}

class UpdateResponse {
    index: products
    updated: null
    id: 100
    result: updated
}
C#
Dictionary<string, Object> meta = new Dictionary<string, Object>(); 
meta.Add("tags", new List<int> {1,2,3});
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "title");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 100, doc: doc);
indexApi.Insert(newdoc);

meta = new Dictionary<string, Object>();
meta.Add("tags", new List<string> {"one","two","three"}); 
doc = new Dictionary<string, Object>(); 
doc.Add("meta", meta);
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", id: 100, doc: doc);
indexApi.Update(updatedoc);
class SuccessResponse {
    index: products
    id: 100
    created: true
    result: created
    found: null
}

class UpdateResponse {
    index: products
    updated: null
    id: 100
    result: updated
}
TypeScript
res = await indexApi.insert({
  index: 'test',
  id: 1,
  doc: { content: 'Text 1', name: 'Doc 1', meta: { tags:[1,2,3] } }
})
res = await indexApi.update({ index: 'test', id: 1, doc: { meta: { tags:['one','two','three'] } } });
{
    "_index":"test",
    "_id":1,
    "created":true,
    "result":"created"
}

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}
Go
metaField := map[string]interface{} {"tags": []int{1, 2, 3}}
insertDoc := map[string]interface{} {"name": "Doc 1", "meta": metaField}}
insertRequest := manticoreclient.NewInsertDocumentRequest("test", insertDoc)
insertRequest.SetId(1)
res, _, _ := apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*insertRequest).Execute();

metaField = map[string]interface{} {"tags": []string{"one", "two", "three"}}
updateDoc := map[string]interface{} {"meta": metaField}
updateRequest := manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
    "_index":"test",
    "_id":1,
    "created":true,
    "result":"created"
}

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

When using replication, the table name should be prepended with cluster_name: (in SQL) so that updates will be propagated to all nodes in the cluster. For queries via HTTP, you should set a cluster property. See setting up replication for more information.

{
  "cluster":"nodes4",
  "index":"test",
  "id":1,
  "doc":
  {
    "gid" : 100,
    "price" : 1000
  }
}
SQL
update weekly:posts set enabled=0 where id=1;
JSON
POST /update
{
    "cluster":"weekly",
    "index":"products",
    "id":1,
    "doc":
    {
        "enabled":0
    }
}
PHP
$index->setName('products')->setCluster('weekly');
$index->updateDocument(['enabled'=>0],1);
Python
indexApi.update({"cluster":"weekly", "index" : "products", "id" : 1, "doc" : {"enabled" : 0}})
javascript
res = wait indexApi.update({"cluster":"weekly", "index" : "products", "id" : 1, "doc" : {"enabled" : 0}});
Java
updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("enabled",0);
}};
updatedoc.index("products").cluster("weekly").id(1L).setDoc(doc);
indexApi.update(updatedoc);
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("enabled", 0);
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", cluster: "weekly", id: 1, doc: doc);
indexApi.Update(updatedoc);
TypeScript
res = wait indexApi.update( {cluster: 'test_cluster', index : 'test', id : 1, doc : {name : 'Doc 11'}} );
Go
updateDoc = map[string]interface{} {"name":"Doc 11"}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetCluster("test_cluster")
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()

Updates via SQL

Here is the syntax for the SQL UPDATE statement:

UPDATE table SET col1 = newval1 [, ...] WHERE where_condition [OPTION opt_name = opt_value [, ...]] [FORCE|IGNORE INDEX(id)]

where_condition has the same syntax as in the SELECT statement.

Multi-value attribute value sets must be specified as comma-separated lists in parentheses. To remove all values from a multi-value attribute, just assign () to it.

SQL
UPDATE products SET tags1=(3,6,4) WHERE id=1;

UPDATE products SET tags1=() WHERE id=1;
Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)
JSON
POST /update

{
    "index":"products",
    "_id":1,
    "doc":
    {
        "tags1": []
    }
}
{
  "_index":"products",
  "updated":1
}
PHP
$index->updateDocument(['tags1'=>[]],1);
Array(
    [_index] => products
    [updated] => 1
)
Python
indexApi.update({"index" : "products", "id" : 1, "doc" : {"tags1": []}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
javascript
indexApi.update({"index" : "products", "id" : 1, "doc" : {"tags1": []}})
{"_index":"products","_id":1,"result":"updated"}
Java
updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("tags1",new int[]{});
}};
updatedoc.index("products").id(1L).setDoc(doc);
indexApi.update(updatedoc);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
C#
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("tags1", new List<int> {});
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updatedoc);
class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}
TypeScript
res = await indexApi.update({ index: 'test', id: 1, doc: { cat: 10 } });
{
    "_index":"test",
    "_id":1,
    "result":"updated"
}
Go
updateDoc = map[string]interface{} {"cat":10}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

OPTION clause is a Manticore-specific extension that lets you control a number of per-update options. The syntax is:

OPTION <optionname>=<value> [ , ... ]

The options are the same as for the SELECT statement. Specifically for the UPDATE statement, you can use these options:

Query optimizer hints

In rare cases, Manticore's built-in query analyzer may be incorrect in understanding a query and determining whether a table by ID should be used. This can result in poor performance for queries like UPDATE ... WHERE id = 123.
For information on how to force the optimizer to use a docid index, see Query optimizer hints.

Updates via HTTP JSON

Updates using HTTP JSON protocol are performed via the /update endpoint. The syntax is similar to the /insert endpoint, but this time the doc property is mandatory.

The server will respond with a JSON object stating if the operation was successful or not.

JSON
POST /update
{
  "index":"test",
  "id":1,
  "doc":
   {
     "gid" : 100,
     "price" : 1000
    }
}
{
  "_index": "test",
  "_id": 1,
  "result": "updated"
}

The ID of the document that needs to be updated can be set directly using the id property, as shown in the previous example, or you can update documents by query and apply the update to all the documents that match the query:

JSON
POST /update

{
  "index":"test",
  "doc":
  {
    "price" : 1000
  },
  "query":
  {
    "match": { "*": "apple" }
  }
}
{
  "_index":"products",
  "updated":1
}

The query syntax is the same as in the /search endpoint. Note that you can't specify id and query at the same time.

Flushing attributes

FLUSH ATTRIBUTES

The FLUSH ATTRIBUTES command flushes all in-memory attribute updates in all the active tables to disk. It returns a tag that identifies the result on-disk state, which represents the number of actual disk attribute saves performed since the server startup.

mysql> UPDATE testindex SET channel_id=1107025 WHERE id=1;
Query OK, 1 row affected (0.04 sec)

mysql> FLUSH ATTRIBUTES;
+------+
| tag  |
+------+
|    1 |
+------+
1 row in set (0.19 sec)

See also attr_flush_period setting.

Bulk updates

You can perform multiple update operations in a single call using the /bulk endpoint. This endpoint only works with data that has Content-Type set to application/x-ndjson. The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline \n and, possibly, a \r.

JSON --Content-type=application/x-ndjson
POST /bulk

{ "update" : { "index" : "products", "id" : 1, "doc": { "price" : 10 } } }
{ "update" : { "index" : "products", "id" : 2, "doc": { "price" : 20 } } }
{
   "items":
   [
      {
         "update":
         {
            "_index":"products",
            "_id":1,
            "result":"updated"
         }
      },
      {
         "update":
         {
            "_index":"products",
            "_id":2,
            "result":"updated"
         }
      }
   ],
   "errors":false
}

The /bulk endpoint supports inserts, replaces, and deletes. Each statement begins with an action type (in this case, update). Here's a list of the supported actions:

Updates by query and deletes by query are also supported.

JSON --Content-type=application/x-ndjson
POST /bulk

{ "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } }
{ "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } }
{
  "items":
  [
    {
      "update":
      {
        "_index":"products",
        "updated":1
      }
    },
    {
      "update":
      {
        "_index":"products",
        "updated":3
      }
    }
  ],
  "errors":false
}
PHP
$client->bulk([
    ['update'=>[
            'index' => 'products',
             'doc' => [
                'coeff' => 100
            ],
            'query' => [
                'range' => ['price'=>['gte'=>1000]]
            ]   
        ]
    ],
    ['update'=>[
            'index' => 'products',
             'doc' => [
                'coeff' => 0
            ],
            'query' => [
                'range' => ['price'=>['lt'=>1000]]
            ]   
        ]
    ]
]);
Array(
    [items] => Array (
        Array(
            [update] => Array(
                [_index] => products
                [updated] => 1
            )
        )   
        Array(
             [update] => Array(
                 [_index] => products
                 [updated] => 3
             )
        )    
)
Python
docs = [ \
            { "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } }, \
            { "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } } ]
indexApi.bulk('\n'.join(map(json.dumps,docs)))
{'error': None,
 'items': [{u'update': {u'_index': u'products', u'updated': 1}},
           {u'update': {u'_index': u'products', u'updated': 3}}]}
javascript
docs = [
            { "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } },
            { "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } } ];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
{"items":[{"update":{"_index":"products","updated":1}},{"update":{"_index":"products","updated":3}}],"errors":false}
Java
String   body = "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 1000 }, \"query\": { \"range\": { \"price\": { \"gte\": 1000 } } } }} "+"\n"+
    "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 0 }, \"query\": { \"range\": { \"price\": { \"lt\": 1000 } } } } }"+"\n";         
indexApi.bulk(body);
class BulkResponse {
    items: [{update={_index=products, _id=1, created=false, result=updated, status=200}}, {update={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}
C#
string   body = "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 1000 }, \"query\": { \"range\": { \"price\": { \"gte\": 1000 } } } }} "+"\n"+
    "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 0 }, \"query\": { \"range\": { \"price\": { \"lt\": 1000 } } } } }"+"\n";         
indexApi.Bulk(body);
class BulkResponse {
    items: [{update={_index=products, _id=1, created=false, result=updated, status=200}}, {update={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}
TypeScript
updateDocs = [
  {
    update: {
      index: 'test',
      id: 1,
      doc: { content: 'Text 11', cat: 1, name: 'Doc 11' },
    },
  },
  {
    update: {
      index: 'test',
      id: 2,
      doc: { content: 'Text 22', cat: 9, name: 'Doc 22' },
    },
  },
];

res = await indexApi.bulk(
  updateDocs.map((e) => JSON.stringify(e)).join("\n")
);
{
  "items":
  [
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    },
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    }
  ],
  "errors":false
}
Go
body := "{\"update\": {\"index\": \"test\", \"id\": 1, \"doc\": {\"content\": \"Text 11\", \"name\": \"Doc 11\", \"cat\": 1 }}}" + "\n" +
    "{\"update\": {\"index\": \"test\", \"id\": 2, \"doc\": {\"content\": \"Text 22\", \"name\": \"Doc 22\", \"cat\": 9 }}}" +"\n";
res, _, _ := apiClient.IndexAPI.Bulk(context.Background()).Body(body).Execute()
{
  "items":
  [
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    },
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    }
  ],
  "errors":false
}

Keep in mind that the bulk operation stops at the first query that results in an error.

attr_update_reserve

attr_update_reserve=size

attr_update_reserve is a per-table setting that determines the space reserved for blob attribute updates. This setting is optional, with a default value of 128k.

When blob attributes (MVAs, strings, JSON) are updated, their length may change. If the updated string (or MVA, or JSON) is shorter than the old one, it overwrites the old one in the .spb file. However, if the updated string is longer, updates are written to the end of the .spb file. This file is memory-mapped, which means resizing it may be a rather slow process, depending on the OS implementation of memory-mapped files.

To avoid frequent resizes, you can specify the extra space to be reserved at the end of the .spb file using this option.

SQL
create table products(title text, price float) attr_update_reserve = '1M'
JSON
POST /cli -d "
create table products(title text, price float) attr_update_reserve = '1M'"
PHP
$params = [
    'body' => [
        'settings' => [
            'attr_update_reserve' => '1M'
        ],
        'columns' => [
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ]
    ],
    'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);
Python
utilsApi.sql('create table products(title text, price float) attr_update_reserve = \'1M\'')
javascript
res = await utilsApi.sql('create table products(title text, price float) attr_update_reserve = \'1M\'');
Java
utilsApi.sql("create table products(title text, price float) attr_update_reserve = '1M'");
C#
utilsApi.Sql("create table products(title text, price float) attr_update_reserve = '1M'");
TypeScript
utilsApi.sql("create table test(content text, name string, cat int) attr_update_reserve = '1M'");
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("create table test(content text, name string, cat int) attr_update_reserve = '1M'").Execute()
CONFIG
table products {
  attr_update_reserve = 1M
  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

attr_flush_period

attr_flush_period = 900 # persist updates to disk every 15 minutes

When updating attributes the changes are first written to in-memory copy of attributes. This setting allows to set the interval between flushing the updates to disk. It defaults to 0, which disables the periodic flushing, but flushing will still occur at normal shut-down.

¶ 12.4

Deleting documents

Deleting documents is only supported in RT mode for the following table types:

You can delete existing documents from a table based on either their ID or certain conditions.

Also, bulk deletion is available to delete multiple documents.

Deletion of documents can be accomplished via both SQL and JSON interfaces.

For SQL, the response for a successful operation will indicate the number of rows deleted.

For JSON, the json/delete endpoint is used. The server will respond with a JSON object indicating whether the operation was successful and the number of rows deleted.

It is recommended to use table truncation instead of deletion to delete all documents from a table, as it is a much faster operation.

In this example we delete all documents that match full-text query test document from the table named test:

SQL
mysql> SELECT * FROM TEST;
+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  100 | 1000 | 100,201     | 100  |
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  103 | 1003 | 103,204     | 103  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
|  106 | 1006 | 106,207     | 106  |
|  107 | 1007 | 107,208     | 107  |
+------+------+-------------+------+
8 rows in set (0.00 sec)

mysql> DELETE FROM TEST WHERE MATCH ('test document');
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * FROM TEST;
+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  100 | 1000 | 100,201     | 100  |
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  103 | 1003 | 103,204     | 103  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
+------+------+-------------+------+
6 rows in set (0.00 sec)
JSON
POST /delete -d '
    {
        "index":"test",
        "query":
        {
            "match": { "*": "test document" }
        }
    }'
* `query` for JSON contains a clause for a full-text search; it has the same syntax as in the [JSON/update](../#Updates-via-HTTP-JSON).
{
    "_index":"test",
    "deleted":2,
}
PHP
$index->deleteDocuments(new MatchPhrase('test document','*'));
Array(
    [_index] => test
    [deleted] => 2
)
Python
indexApi.delete({"index" : "test", "query": { "match": { "*": "test document" }}})
{'deleted': 5, 'id': None, 'index': 'test', 'result': None}
javascript
res = await indexApi.delete({"index" : "test", "query": { "match": { "*": "test document" }}});
{"_index":"test","deleted":5}
Java
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","test document");
}});
deleteRequest.index("test").setQuery(query);
indexApi.delete(deleteRequest);
class DeleteResponse {
    index: test
    deleted: 5
    id: null
    result: null
}
C#
Dictionary<string, Object> match = new Dictionary<string, Object>();
match.Add("*", "test document");
Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("match", match);
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", query: query);
indexApi.Delete(deleteRequest);
class DeleteResponse {
    index: test
    deleted: 5
    id: null
    result: null
}
TypeScript
res = await indexApi.delete({
  index: 'test',
  query: { match: { '*': 'test document' } },
});
{"_index":"test","deleted":5}
Go
deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
matchExpr := map[string]interface{} {"*": "test document"}
deleteQuery := map[string]interface{} {"match": matchExpr }
deleteRequest.SetQuery(deleteQuery)
{"_index":"test","deleted":5}

Here - deleting a document with id equalling 1 from the table named test:

SQL
mysql> DELETE FROM TEST WHERE id=1;
Query OK, 1 rows affected (0.00 sec)
JSON
POST /delete -d '
    {
        "index": "test",
        "id": 1
    }'
* `id` for JSON is the row `id` which should be deleted.
{
    "_index": "test",
    "_id": 1,
    "found": true,
    "result": "deleted"      
}
PHP
$index->deleteDocument(1);
Array(
    [_index] => test
    [_id] => 1
    [found] => true
    [result] => deleted
)
Python
indexApi.delete({"index" : "test", "id" : 1})
{'deleted': None, 'id': 1, 'index': 'test', 'result': 'deleted'}
javascript
res = await indexApi.delete({"index" : "test", "id" : 1});
{"_index":"test","_id":1,"result":"deleted"}
Java
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
deleteRequest.index("test").setId(1L);
indexApi.delete(deleteRequest);
class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}
C#
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", id: 1);
indexApi.Delete(deleteRequest);
class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}
TypeScript
res = await indexApi.delete({ index: 'test', id: 1 });
{"_index":"test","_id":1,"result":"deleted"}
Go
deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
deleteRequest.SetId(1)
{"_index":"test","_id":1,"result":"deleted"}

Here, documents with id matching values from the table named test are deleted:

Note that the delete forms with id=N or id IN (X,Y) are the fastest, as they delete documents without performing a search.
Also note that the response contains only the id of the first deleted document in the corresponding _id field.

SQL
DELETE FROM TEST WHERE id IN (1,2);
Query OK, 2 rows affected (0.00 sec)
JSON
POST /delete -d '
    {
        "index":"test",
        "id": [1,2]
    }'
    {
        "_index":"test",
        "_id":1,
        "found":true,
        "result":"deleted"      
    }
PHP
$index->deleteDocumentsByIds([1,2]);
Array(
    [_index] => test
    [_id] => 1
    [found] => true
    [result] => deleted
)

Manticore SQL allows to use complex conditions for the DELETE statement.

For example here we are deleting documents that match full-text query test document and have attribute mva1 with a value greater than 206 or mva1 values 100 or 103 from table named test:

SQL
DELETE FROM TEST WHERE MATCH ('test document') AND ( mva1>206 or mva1 in (100, 103) );

SELECT * FROM TEST;
Query OK, 4 rows affected (0.00 sec)

+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
+------+------+-------------+------+
6 rows in set (0.00 sec)

Here is an example of deleting documents in cluster cluster's table test. Note that we must provide the cluster name property along with table property to delete a row from a table within a replication cluster:

SQL
delete from cluster:test where id=100;
JSON
POST /delete -d '
    {
      "cluster": "cluster",
      "index": "test",
      "id": 100
    }'
* `cluster` for JSON is the name of the [replication cluster](../#Replication-cluster). which contains the needed table
PHP
$index->setCluster('cluster');
$index->deleteDocument(100);
Array(
    [_index] => test
    [_id] => 100
    [found] => true
    [result] => deleted
)
Python
indexApi.delete({"cluster":"cluster","index" : "test", "id" : 1})
{'deleted': None, 'id': 1, 'index': 'test', 'result': 'deleted'}
javascript
indexApi.delete({"cluster":"cluster_1","index" : "test", "id" : 1})
{"_index":"test","_id":1,"result":"deleted"}
Java
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
deleteRequest.cluster("cluster").index("test").setId(1L);
indexApi.delete(deleteRequest);
class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}
C#
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", cluster: "cluster", id: 1);
indexApi.Delete(deleteRequest);
class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}
TypeScript
res = await indexApi.delete({ cluster: 'cluster_1', index: 'test', id: 1 });
{"_index":"test","_id":1,"result":"deleted"}
Go
deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
deleteRequest.SetCluster("cluster_1")
deleteRequest.SetId(1)
{"_index":"test","_id":1,"result":"deleted"}

Bulk deletion

You can also perform multiple delete operations in a single call using the /bulk endpoint. This endpoint only works with data that has Content-Type set to application/x-ndjson. The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline \n and, possibly, a \r.

JSON --Content-type=application/x-ndjson
POST /bulk

{ "delete" : { "index" : "test", "id" : 1 } }
{ "delete" : { "index" : "test", "query": { "equals": { "int_data" : 20 } } } }
{
   "items":
   [
      {
         "bulk":
         {
            "_index":"test",
            "_id":0,
            "created":0,
            "deleted":2,
            "updated":0,
            "result":"created",
            "status":201
         }
      }
   ],
   "errors":false
}
PHP
$client->bulk([
    ['delete' => [
            'index' => 'test',
            'id' => 1
        ]
    ],
    ['delete'=>[
            'index' => 'test',
            'query' => [
                'equals' => ['int_data' => 20]
            ]
        ]
    ]
]);
Array(
    [items] => Array
        (
            [0] => Array
                (
                    [bulk] => Array
                        (
                            [_index] => test
                            [_id] => 0
                            [created] => 0
                            [deleted] => 2
                            [updated] => 0
                            [result] => created
                            [status] => 201
                        )

                )

        )

    [current_line] => 3
    [skipped_lines] => 0
    [errors] =>
    [error] =>
)
Python
docs = [ \
            { "delete" : { "index" : "test", "id": 1 } }, \
            { "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } } ]
indexApi.bulk('\n'.join(map(json.dumps,docs)))
{
    'error': None,
    'items': [{u'delete': {u'_index': test', u'deleted': 2}}]
}
javascript
docs = [
            { "delete" : { "index" : "test", "id": 1 } },
            { "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } } ];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}
Java
String   body = "{ "delete" : { "index" : "test", "id": 1 } } "+"\n"+
    "{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }"+"\n";         
indexApi.bulk(body);
class BulkResponse {
    items: [{delete={_index=test, _id=0, created=false, deleted=2, result=created, status=200}}]
    error: null
    additionalProperties: {errors=false}
}
C#
string   body = "{ "delete" : { "index" : "test", "id": 1 } } "+"\n"+
    "{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }"+"\n";         
indexApi.Bulk(body);
class BulkResponse {
    items: [{replace={_index=test, _id=0, created=false, deleted=2, result=created, status=200}}]
    error: null
    additionalProperties: {errors=false}
}
TypeScript
docs = [
  { "delete" : { "index" : "test", "id": 1 } },
  { "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }
];
body = await indexApi.bulk(
  docs.map((e) => JSON.stringify(e)).join("\n")
);            
res = await indexApi.bulk(body);
{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}
Go
docs = []string {
  `{ "delete" : { "index" : "test", "id": 1 } }`,
  `{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }`
]
body = strings.Join(docs, "\n")
resp, httpRes, err := manticoreclient.IndexAPI.Bulk(context.Background()).Body(body).Execute()
{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}
¶ 12.5

Transactions

Manticore supports basic transactions for deleting and inserting data into real-time and percolate tables, except when attempting to write to a distributed table which includes a real-time or percolate table. Each change to a table is first saved in an internal changeset and then actually committed to the table. By default, each command is wrapped in an individual automatic transaction, making it transparent: you simply 'insert' something and can see the inserted result after it completes, without worrying about transactions. However, this behavior can be explicitly managed by starting and committing transactions manually.

Transactions are supported for the following commands:

Transactions are not supported for:

Please note that transactions in Manticore do not aim to provide isolation. The purpose of transactions in Manticore is to allow you to accumulate multiple writes and execute them all at once upon commit, or to roll them all back if necessary. Transactions are integrated with binary log for durability and consistency.

Automatic and manual mode

SET AUTOCOMMIT = {0 | 1}

SET AUTOCOMMIT controls the autocommit mode in the active session. AUTOCOMMIT is set to 1 by default. With the default setting, you don't have to worry about transactions, as every statement that makes any changes to any table is implicitly wrapped in a separate transaction. Setting it to 0 allows you to manage transactions manually, meaning they will not be visible until you explicitly commit them.

Transactions are limited to a single real-time or percolate table and are also limited in size. They are atomic, consistent, overly isolated, and durable. Overly isolated means that the changes are not only invisible to concurrent transactions but even to the current session itself.

BEGIN, COMMIT, and ROLLBACK

START TRANSACTION | BEGIN
COMMIT
ROLLBACK

The BEGIN statement (or its START TRANSACTION alias) forcibly commits any pending transaction, if present, and starts a new one.

The COMMIT statement commits the current transaction, making all its changes permanent.

The ROLLBACK statement rolls back the current transaction, canceling all its changes.

Transactions in /bulk

When using one of the /bulk JSON endpoints ( bulk insert, bulk replace, bulk delete ), you can force a batch of documents to be committed by adding an empty line after them.

Examples

Automatic commits (default)

insert into indexrt (id, content, title, channel_id, published) values (1, 'aa', 'blabla', 1, 10);
Query OK, 1 rows affected (0.00 sec)

select * from indexrt where id=1;
+------+------------+-----------+--------+
| id   | channel_id | published | title  |
+------+------------+-----------+--------+
|    1 |          1 |        10 | blabla |
+------+------------+-----------+--------+
1 row in set (0.00 sec)

The inserted value is immediately visible in the following 'select' statement.

Manual commits (autocommit=0)

set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

insert into indexrt (id, content, title, channel_id, published) values (3, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)

insert into indexrt (id, content, title, channel_id, published) values (4, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)

select * from indexrt where id=3;
Empty set (0.01 sec)

select * from indexrt where id=4;
Empty set (0.00 sec)

In this case, changes are NOT automatically committed. As a result, the insertions are not visible, even in the same session, since they have not been committed. Also, despite the absence of a BEGIN statement, a transaction is implicitly started.

To make the changes visible, you need to commit the transaction:

commit;
Query OK, 0 rows affected (0.00 sec)

select * from indexrt where id=4;
+------+------------+-----------+-------+
| id   | channel_id | published | title |
+------+------------+-----------+-------+
|    4 |          1 |         1 | bb    |
+------+------------+-----------+-------+
1 row in set (0.00 sec)

select * from indexrt where id=3;
+------+------------+-----------+-------+
| id   | channel_id | published | title |
+------+------------+-----------+-------+
|    3 |          1 |         1 | bb    |
+------+------------+-----------+-------+
1 row in set (0.00 sec)

After the commit statement, the insertions are visible in the table.

Manual transaction

By using BEGIN and COMMIT, you can define the bounds of a transaction explicitly, so there's no need to worry about autocommit in this case.

begin;
Query OK, 0 rows affected (0.00 sec)

insert into indexrt (id, content, title, channel_id, published) values (2, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)

select * from indexrt where id=2;
Empty set (0.01 sec)

commit;
Query OK, 0 rows affected (0.01 sec)

select * from indexrt where id=2;
+------+------------+-----------+-------+
| id   | channel_id | published | title |
+------+------------+-----------+-------+
|    2 |          1 |         1 | bb    |
+------+------------+-----------+-------+
1 row in set (0.01 sec)
¶ 13

Searching

¶ 13.1

Introduction into searching with Manticore Search

Searching is a core feature of Manticore Search. You can:

General syntax

SQL:

SELECT ... [OPTION <optionname>=<value> [ , ... ]]

HTTP:

POST /search
{   
    "index" : "index_name",
    "options":   
    {
      ...
    }
}
¶ 13.2

Full-text matching

¶ 13.2.1

MATCH

The MATCH clause allows for full-text searches in text fields. The input query string is tokenized using the same settings applied to the text during indexing. In addition to the tokenization of input text, the query string supports a number of full-text operators that enforce various rules on how keywords should provide a valid match.

Full-text match clauses can be combined with attribute filters as an AND boolean. OR relations between full-text matches and attribute filters are not supported.

The match query is always executed first in the filtering process, followed by the attribute filters. The attribute filters are applied to the result set of the match query. A query without a match clause is called a fullscan.

There must be at most one MATCH() in the SELECT clause.

Using the full-text query syntax, matching is performed across all indexed text fields of a document, unless the expression requires a match within a field (like phrase search) or is limited by field operators.

SQL

SELECT * FROM myindex WHERE MATCH('cats|birds');

The SELECT statement uses a MATCH clause, which must come after WHERE, for performing full-text searches. MATCH() accepts an input string in which all full-text operators are available.

SQL
SELECT * FROM myindex WHERE MATCH('"find me fast"/2');
+------+------+----------------+
| id   | gid  | title          |
+------+------+----------------+
|    1 |   11 | first find me  |
|    2 |   12 | second find me |
+------+------+----------------+
2 rows in set (0.00 sec)
MATCH with filters
An example of a more complex query using MATCH with WHERE filters.
SELECT * FROM myindex WHERE MATCH('cats|birds') AND (`title`='some title' AND `id`=123);

HTTP JSON

Full-text matching is available in the /search endpoint and in HTTP-based clients. The following clauses can be used for performing full-text matches:

match

"match" is a simple query that matches the specified keywords in the specified fields.

"query":
{
  "match": { "field": "keyword" }
}

You can specify a list of fields:

"match":
{
  "field1,field2": "keyword"
}

Or you can use _all or * to search all fields.

You can search all fields except one using "!field":

"match":
{
  "!field1": "keyword"
}

By default, keywords are combined using the OR operator. However, you can change that behavior using the "operator" clause:

"query":
{
  "match":
  {
    "content,title":
    {
      "query":"keyword",
      "operator":"or"
    }
  }
}

"operator" can be set to "or" or "and".

match_phrase

"match_phrase" is a query that matches the entire phrase. It is similar to a phrase operator in SQL. Here's an example:

"query":
{
  "match_phrase": { "_all" : "had grown quite" }
}

query_string

"query_string" accepts an input string as a full-text query in MATCH() syntax.

"query":
{
  "query_string": "Church NOTNEAR/3 street"
}

match_all

"match_all" accepts an empty object and returns documents from the table without performing any attribute filtering or full-text matching. Alternatively, you can just omit the query clause in the request which will have the same effect.

"query":
{
  "match_all": {}
}

Combining full-text filtering with other filters

All full-text match clauses can be combined with must, must_not, and should operators of a JSON bool query.

Examples:

match
// POST /search -d
{
    "index" : "hn_small",
    "query":
    {
        "match":
        {
            "*" : "find joe"
        }
    },
    "_source": ["story_author","comment_author"],
    "limit": 1
}
{
   "took" : 3,
   "timed_out" : false,
   "hits" : {
      "hits" : [
         {
            "_id" : "668018",
            "_score" : 3579,
            "_source" : {
               "story_author" : "IgorPartola",
               "comment_author" : "joe_the_user"
            }
         }
      ],
      "total" : 88063,
      "total_relation" : "eq"
   }
}
match_phrase
POST /search
-d
'{
    "index" : "hn_small",
    "query":
    {
        "match_phrase":
        {
            "*" : "find joe"
        }
    },
    "_source": ["story_author","comment_author"],
    "limit": 1
}'
{
   "took" : 3,
   "timed_out" : false,
   "hits" : {
      "hits" : [
         {
            "_id" : "807160",
            "_score" : 2599,
            "_source" : {
               "story_author" : "rbanffy",
               "comment_author" : "runjake"
            }
         }
      ],
      "total" : 2,
      "total_relation" : "eq"
   }
}
query_string
POST /search
-d
'{   "index" : "hn_small",
    "query":
    {
        "query_string": "@comment_text \"find joe fast \"/2"
    },
    "_source": ["story_author","comment_author"],
    "limit": 1
}'
{
  "took" : 3,
  "timed_out" : false,
  "hits" : {
      "hits" : [
         {
            "_id" : "807160",
            "_score" : 2566,
            "_source" : {
               "story_author" : "rbanffy",
               "comment_author" : "runjake"
            }
         }
      ],
      "total" : 1864,
      "total_relation" : "eq"
   }
}
PHP
$search = new Search(new Client());
$result = $search->('@title find me fast');
foreach($result as $doc)
{
    echo 'Document: '.$doc->getId();
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.': '.$value;
    }
}
Document: 1
title: first find me fast
gid: 11
Document: 2
title: second find me fast
gid: 12
Python
Python
searchApi.search({"index":"hn_small","query":{"query_string":"@comment_text \"find joe fast \"/2"}, "_source": ["story_author","comment_author"], "limit":1})
{'aggregations': None,
 'hits': {'hits': [{'_id': '807160',
                    '_score': 2566,
                    '_source': {'comment_author': 'runjake',
                                'story_author': 'rbanffy'}}],
          'max_score': None,
          'total': 1864,
          'total_relation': 'eq'},
 'profile': None,
 'timed_out': False,
 'took': 2,
 'warning': None}
javascript
javascript
res = await searchApi.search({"index":"hn_small","query":{"query_string":"@comment_text \"find joe fast \"/2"}, "_source": ["story_author","comment_author"], "limit":1});
{
  took: 1,
  timed_out: false,
  hits: {
    exports: {
      total: 1864,
      total_relation: 'eq',
      hits: [
        {
          _id: '807160',
          _score: 2566,
          _source: { story_author: 'rbanffy', comment_author: 'runjake' }
        }
      ]
    }
  }
}
java
Java
query = new HashMap<String,Object>();
query.put("query_string", "@comment_text \"find joe fast \"/2");
searchRequest = new SearchRequest();
searchRequest.setIndex("hn_small");
searchRequest.setQuery(query);
searchRequest.addSourceItem("story_author");
searchRequest.addSourceItem("comment_author");
searchRequest.limit(1);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 1
    timedOut: false
    aggregations: null
    hits: class SearchResponseHits {
        maxScore: null
        total: 1864
        totalRelation: eq
        hits: [{_id=807160, _score=2566, _source={story_author=rbanffy, comment_author=runjake}}]
    }
    profile: null
    warning: null
}
C#
C#
object query =  new { query_string="@comment_text \"find joe fast \"/2" };
var searchRequest = new SearchRequest("hn_small", query);
searchRequest.Source = new List<string> {"story_author", "comment_author"};
searchRequest.Limit = 1;
SearchResponse searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 1
    timedOut: false
    aggregations: null
    hits: class SearchResponseHits {
        maxScore: null
        total: 1864
        totalRelation: eq
        hits: [{_id=807160, _score=2566, _source={story_author=rbanffy, comment_author=runjake}}]
    }
    profile: null
    warning: null
}
TypeScript
TypeScript
res = await searchApi.search({
  index: 'test',
  query: { query_string: "test document 1" },
  "_source": ["content", "title"],
  limit: 1
});
{
  took: 1,
  timed_out: false,
  hits:
   exports {
     total: 5,
     total_relation: 'eq',
     hits:
      [ { _id: '1',
          _score: 2566,
          _source: { content: 'This is a test document 1', title: 'Doc 1' }
        }
      ]
   }
}
Go
Go
searchRequest := manticoresearch.NewSearchRequest("test")
query := map[string]interface{} {"query_string": "test document 1"}
searchReq.SetSource([]string{"content", "title"})
searchReq.SetLimit(1)
resp, httpRes, err := search.SearchRequest(*searchRequest).Execute()
{
  "hits": {
    "hits": [
      {
        "_id": "1",
        "_score": 2566,
        "_source": {
          "content": "This is a test document 1",
          "title": "Doc 1"
        }
      }
    ],
    "total": 5,
    "total_relation": "eq"
  },
  "timed_out": false,
  "took": 0
}
¶ 13.2.2

Full text operators

The query string can include specific operators that define the conditions for how the words from the query string should be matched.

Boolean operators

AND operator

An implicit logical AND operator is always present, so "hello world" implies that both "hello" and "world" must be found in the matching document.

hello  world

Note: There is no explicit AND operator.

OR operator

The logical OR operator | has a higher precedence than AND, so looking for cat | dog | mouse means looking for (cat | dog | mouse) rather than (looking for cat) | dog | mouse.

hello | world

Note: There is no operator OR. Please use | instead.

MAYBE operator

hello MAYBE world

The MAYBE operator functions similarly to the | operator, but it does not return documents that match only the right subtree expression.

Negation operator

hello -world
hello !world

The negation operator enforces a rule for a word to not exist.

Queries containing only negations are not supported by default. To enable, use the server option not_terms_only_allowed.

Field search operator

@title hello @body world

The field limit operator restricts subsequent searches to a specified field. By default, the query will fail with an error message if the given field name does not exist in the searched table. However, this behavior can be suppressed by specifying the @@relaxed option at the beginning of the query:

@@relaxed @nosuchfield my query

This can be useful when searching through heterogeneous tables with different schemas.

Field position limits additionally constrain the search to the first N positions within a given field (or fields). For example, @body [50] hello will not match documents where the keyword hello appears at position 51 or later in the body.

@body[50] hello

Multiple-field search operator:

@(title,body) hello world

Ignore field search operator (ignores any matches of 'hello world' from the 'title' field):

@!title hello world

Ignore multiple-field search operator (if there are fields 'title', 'subject', and 'body', then @!(title) is equivalent to @(subject,body)):

@!(title,body) hello world

All-field search operator:

@* hello

Phrase search operator

"hello world"

The phrase operator mandates that the words be adjacent to each other.

The phrase search operator can incorporate a match any term modifier. Within the phrase operator, terms are positionally significant. When the 'match any term' modifier is employed, the positions of the subsequent terms in that phrase query will be shifted. As a result, the 'match any' modifier does not affect search performance.

"exact * phrase * * for terms"

Proximity search operator

"hello world"~10

Proximity distance is measured in words, accounting for word count, and applies to all words within quotes. For example, the query "cat dog mouse"~5 indicates that there must be a span of fewer than 8 words containing all 3 words. Therefore, a document with CAT aaa bbb ccc DOG eee fff MOUSE will not match this query, as the span is exactly 8 words long.

Quorum matching operator

"the world is a wonderful place"/3

The quorum matching operator introduces a type of fuzzy matching. It will match only those documents that meet a given threshold of specified words. In the example above ("the world is a wonderful place"/3), it will match all documents containing at least 3 of the 6 specified words. The operator is limited to 255 keywords. Instead of an absolute number, you can also provide a value between 0.0 and 1.0 (representing 0% and 100%), and Manticore will match only documents containing at least the specified percentage of given words. The same example above could also be expressed as "the world is a wonderful place"/0.5, and it would match documents with at least 50% of the 6 words.

Strict order operator

aaa << bbb << ccc

The strict order operator (also known as the "before" operator) matches a document only if its argument keywords appear in the document precisely in the order specified in the query. For example, the query black << cat will match the document "black and white cat" but not the document "that cat was black". The order operator has the lowest priority. It can be applied to both individual keywords and more complex expressions. For instance, this is a valid query:

(bag of words) << "exact phrase" << red|green|blue

Exact form modifier

raining =cats and =dogs
="exact phrase"

The exact form keyword modifier matches a document only if the keyword appears in the exact form specified. By default, a document is considered a match if the stemmed/lemmatized keyword matches. For instance, the query "runs" will match both a document containing "runs" and one containing "running", because both forms stem to just "run". However, the =runs query will only match the first document. The exact form operator requires the index_exact_words option to be enabled.

Another use case is to prevent expanding a keyword to its *keyword* form. For example, with index_exact_words=1 + expand_keywords=1/star, bcd will find a document containing abcde, but =bcd will not.

As a modifier affecting the keyword, it can be used within operators such as phrase, proximity, and quorum operators. Applying an exact form modifier to the phrase operator is possible, and in this case, it internally adds the exact form modifier to all terms in the phrase.

Wildcard operators

nation* *nation* *national

Requires min_infix_len for prefix (expansion in trail) and/or suffix (expansion in head). If only prefixing is desired, min_prefix_len can be used instead.

The search will attempt to find all expansions of the wildcarded tokens, and each expansion is recorded as a matched hit. The number of expansions for a token can be controlled with the expansion_limit table setting. Wildcarded tokens can have a significant impact on query search time, especially when tokens have short lengths. In such cases, it is desirable to use the expansion limit.

The wildcard operator can be automatically applied if the expand_keywords table setting is used.

In addition, the following inline wildcard operators are supported:

The inline operators require dict=keywords and infixing enabled.

REGEX operator

REGEX(/t.?e/)

Requires the min_infix_len or min_prefix_len and dict=keywords options to be set (which is a default).

Similarly to the wildcard operators, the REGEX operator attempts to find all tokens matching the provided pattern, and each expansion is recorded as a matched hit. Note, this can have a significant impact on query search time, as the entire dictionary is scanned, and every term in the dictionary undergoes matching with the REGEX pattern.

The patterns should adhere to the RE2 syntax. The REGEX expression delimiter is the first symbol after the open bracket. In other words, all text between the open bracket followed by the delimiter and the delimiter and the closed bracket is considered as a RE2 expression.
Please note that the terms stored in the dictionary undergo charset_table transformation, meaning that for example, REGEX may not be able to match uppercase characters if all characters are lowercased according to the charset_table (which happens by default). To successfully match a term using a REGEX expression, the pattern must correspond to the entire token. To achieve partial matching, place .* at the beginning and/or end of your pattern.

REGEX(/.{3}t/)
REGEX(/t.*\d*/)

Field-start and field-end modifier

^hello world$

Field-start and field-end keyword modifiers ensure that a keyword only matches if it appears at the very beginning or the very end of a full-text field, respectively. For example, the query "^hello world$" (enclosed in quotes to combine the phrase operator with the start/end modifiers) will exclusively match documents containing at least one field with these two specific keywords.

IDF boost modifier

boosted^1.234 boostedfieldend$^1.234

The boost modifier raises the word IDF score by the indicated factor in ranking scores that incorporate IDF into their calculations. It does not impact the matching process in any manner.

NEAR operator

hello NEAR/3 world NEAR/4 "my test"

The NEAR operator is a more generalized version of the proximity operator. Its syntax is NEAR/N, which is case-sensitive and does not allow spaces between the NEAR keywords, slash sign, and distance value.

While the original proximity operator works only on sets of keywords, NEAR is more versatile and can accept arbitrary subexpressions as its two arguments. It matches a document when both subexpressions are found within N words of each other, regardless of their order. NEAR is left-associative and shares the same (lowest) precedence as BEFORE.

It is important to note that one NEAR/7 two NEAR/7 three is not exactly equivalent to "one two three"~7. The key difference is that the proximity operator allows up to 6 non-matching words between all three matching words, while the version with NEAR is less restrictive: it permits up to 6 words between one and two, and then up to 6 more between that two-word match and three.

NOTNEAR operator

Church NOTNEAR/3 street

The NOTNEAR operator serves as a negative assertion. It matches a document when the left argument is present and either the right argument is absent from the document or the right argument is a specified distance away from the end of the left matched argument. The distance is denoted in words. The syntax is NOTNEAR/N, which is case-sensitive and does not permit spaces between the NOTNEAR keyword, slash sign, and distance value. Both arguments of this operator can be terms or any operators or group of operators.

SENTENCE and PARAGRAPH operators

all SENTENCE words SENTENCE "in one sentence"
"Bill Gates" PARAGRAPH "Steve Jobs"

The SENTENCE and PARAGRAPH operators match a document when both of their arguments are within the same sentence or the same paragraph of text, respectively. These arguments can be keywords, phrases, or instances of the same operator.

The order of the arguments within the sentence or paragraph is irrelevant. These operators function only with tables built with index_sp (sentence and paragraph indexing feature) enabled and revert to a simple AND operation otherwise. For information on what constitutes a sentence and a paragraph, refer to the index_sp directive documentation.

ZONE limit operator

ZONE:(h3,h4)

only in these titles

The ZONE limit operator closely resembles the field limit operator but limits matching to a specified in-field zone or a list of zones. It is important to note that subsequent subexpressions do not need to match within a single continuous span of a given zone and may match across multiple spans. For example, the query (ZONE:th hello world) will match the following sample document:

<th>Table 1. Local awareness of Hello Kitty brand.</th>
.. some table data goes here ..
<th>Table 2. World-wide brand awareness.</th>

The ZONE operator influences the query until the next field or ZONE limit operator, or until the closing parenthesis. It functions exclusively with tables built with zone support (refer to index_zones) and will be disregarded otherwise.

ZONESPAN limit operator

ZONESPAN:(h2)

only in a (single) title

The ZONESPAN limit operator resembles the ZONE operator but mandates that the match occurs within a single continuous span. In the example provided earlier, ZONESPAN:th hello world would not match the document, as "hello" and "world" do not appear within the same span.

¶ 13.2.3

Escaping characters in query string

Since certain characters function as operators in the query string, they must be escaped to prevent query errors or unintended matching conditions.

The following characters should be escaped using a backslash (\):

!    "    $    '    (    )    -    /    <    @    \    ^    |    ~

In MySQL command line client

To escape a single quote ('), use one backslash:

SELECT * FROM your_index WHERE MATCH('l\'italiano');

For the other characters in the list mentioned earlier, which are operators or query constructs, they must be treated as simple characters by the engine, with a preceding escape character.
The backslash must also be escaped, resulting in two backslashes:

SELECT * FROM your_index WHERE MATCH('r\\&b | \\(official video\\)');

To use a backslash as a character, you must escape both the backslash as a character and the backslash as the escape operator, which requires four backslashes:

SELECT * FROM your_index WHERE MATCH('\\\\ABC');

When you are working with JSON data in Manticore Search and need to include a double quote (") within a JSON string, it's important to handle it with proper escaping. In JSON, a double quote within a string is escaped using a backslash (\). However, when inserting the JSON data through an SQL query, Manticore Search interprets the backslash (\) as an escape character within strings.

To ensure the double quote is correctly inserted into the JSON data, you need to escape the backslash itself. This results in using two backslashes (\\) before the double quote. For example:

insert into tbl(j) values('{"a": "\\"abc\\""}');

Using MySQL drivers

MySQL drivers provide escaping functions (e.g., mysqli_real_escape_string in PHP or conn.escape_string in Python), but they only escape specific characters.
You will still need to add escaping for the characters from the previously mentioned list that are not escaped by their respective functions.
Because these functions will escape the backslash for you, you only need to add one backslash.

This also applies to drivers that support (client-side) prepared statements. For example, with PHP PDO prepared statements, you need to add a backslash for the $ character:

$statement = $ln_sph->prepare( "SELECT * FROM index WHERE MATCH(:match)");
$match = '\$manticore';
$statement->bindParam(':match',$match,PDO::PARAM_STR);
$results = $statement->execute();

This results in the final query SELECT * FROM index WHERE MATCH('\\$manticore');

In HTTP JSON API

The same rules for the SQL protocol apply, with the exception that for JSON, the double quote must be escaped with a single backslash, while the rest of the characters require double escaping.

When using JSON libraries or functions that convert data structures to JSON strings, the double quote and single backslash are automatically escaped by these functions and do not need to be explicitly escaped.

In clients

The new official clients (which use the HTTP protocol) utilize common JSON libraries/functions available in their respective programming languages under the hood. The same rules for escaping mentioned earlier apply.

Escaping asterisk

The asterisk (*) is a unique character that serves two purposes:

Unlike other special characters that function as operators, the asterisk cannot be escaped when it's in a position to provide one of its functionalities.

In non-wildcard queries, the asterisk does not require escaping, whether it's in the charset_table or not.

In wildcard queries, an asterisk in the middle of a word does not require escaping. As a wildcard operator (either at the beginning or end of the word), the asterisk will always be interpreted as the wildcard operator, even if escaping is applied.

Escaping json node names in SQL

To escape special characters in JSON nodes, use a backtick. For example:

MySQL [(none)]> select * from t where json.`a=b`=234;
+---------------------+-------------+------+
| id                  | json        | text |
+---------------------+-------------+------+
| 8215557549554925578 | {"a=b":234} |      |
+---------------------+-------------+------+

MySQL [(none)]> select * from t where json.`a:b`=123;
+---------------------+-------------+------+
| id                  | json        | text |
+---------------------+-------------+------+
| 8215557549554925577 | {"a:b":123} |      |
+---------------------+-------------+------+
¶ 13.2.4

Search profiling

How a query is interpreted

Consider this complex query example:

"hello world" @title "example program"~5 @body python -(php|perl) @* code

The full meaning of this search is:

The OR operator takes precedence over AND, so "looking for cat | dog | mouse" means "looking for (cat | dog | mouse)" rather than "(looking for cat) | dog | mouse".

To comprehend how a query will be executed, Manticore Search provides query profiling tools to examine the query tree generated by a query expression.

Profiling the query tree in SQL

To enable full-text query profiling with an SQL statement, you must activate it before executing the desired query:

SET profiling =1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');

To view the query tree, execute the SHOW PLAN command immediately after running the query:

SHOW PLAN;

This command will return the structure of the executed query. Keep in mind that the 3 statements - SET profiling, the query, and SHOW - must be executed within the same session.

Profiling the query in HTTP JSON

When using the HTTP JSON protocol we can just enable "profile":true to get in response the full-text query tree structure.

{
  "index":"test",
  "profile":true,
  "query":
  {
    "match_phrase": { "_all" : "had grown quite" }
  }
}

The response will include a profile object containing a query member.

The query property holds the transformed full-text query tree. Each node consists of:

A keyword node will additionally include:

SQL
SET profiling=1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');
SHOW PLAN \G
*************************** 1\. row ***************************
Variable: transformed_tree
   Value: AND(
  OR(fields=(title), KEYWORD(abcx, querypos=1, expanded), KEYWORD(abcm, querypos=1, expanded)),
  AND(fields=(body), KEYWORD(hey, querypos=2)))
1 row in set (0.00 sec)
JSON
POST /search
{
  "index": "forum",
  "query": {"query_string": "i me"},
  "_source": { "excludes":["*"] },
  "limit": 1,
  "profile":true
}
{
  "took":1503,
  "timed_out":false,
  "hits":
  {
    "total":406301,
    "hits":
    [
       {
          "_id":"406443",
          "_score":3493,
          "_source":{}
       }
    ]
  },
  "profile":
  {
    "query":
    {
      "type":"AND",
      "description":"AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2)))",
      "children":
      [
        {
          "type":"AND",
          "description":"AND(KEYWORD(i, querypos=1))",
          "children":
          [
            {
              "type":"KEYWORD",
              "word":"i",
              "querypos":1
            }
          ]
        },
        {
          "type":"AND",
          "description":"AND(KEYWORD(me, querypos=2))",
          "children":
          [
            {
              "type":"KEYWORD",
              "word":"me",
              "querypos":2
            }
          ]
        }
      ]
    }
  }
}
PHP
$result = $index->search('i me')->setSource(['excludes'=>['*']])->setLimit(1)->profile()->get();
print_r($result->getProfile());
Array
(
    [query] => Array
        (
            [type] => AND
            [description] => AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2)))
            [children] => Array
                (
                    [0] => Array
                        (
                            [type] => AND
                            [description] => AND(KEYWORD(i, querypos=1))
                            [children] => Array
                                (
                                    [0] => Array
                                        (
                                            [type] => KEYWORD
                                            [word] => i
                                            [querypos] => 1
                                        )
                                )
                        )
                    [1] => Array
                        (
                            [type] => AND
                            [description] => AND(KEYWORD(me, querypos=2))
                            [children] => Array
                                (
                                    [0] => Array
                                        (
                                            [type] => KEYWORD
                                            [word] => me
                                            [querypos] => 2
                                        )
                                )
                        )
                )
        )
)
Python
Python
searchApi.search({"index":"forum","query":{"query_string":"i me"},"_source":{"excludes":["*"]},"limit":1,"profile":True})
{'hits': {'hits': [{u'_id': u'100', u'_score': 2500, u'_source': {}}],
          'total': 1},
 'profile': {u'query': {u'children': [{u'children': [{u'querypos': 1,
                                                      u'type': u'KEYWORD',
                                                      u'word': u'i'}],
                                       u'description': u'AND(KEYWORD(i, querypos=1))',
                                       u'type': u'AND'},
                                      {u'children': [{u'querypos': 2,
                                                      u'type': u'KEYWORD',
                                                      u'word': u'me'}],
                                       u'description': u'AND(KEYWORD(me, querypos=2))',
                                       u'type': u'AND'}],
                        u'description': u'AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2)))',
                        u'type': u'AND'}},
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.search({"index":"forum","query":{"query_string":"i me"},"_source":{"excludes":["*"]},"limit":1,"profile":true});
{"hits": {"hits": [{"_id": "100", "_score": 2500, "_source": {}}],
          "total": 1},
 "profile": {"query": {"children": [{"children": [{"querypos": 1,
                                                      "type": "KEYWORD",
                                                      "word": "i"}],
                                       "description": "AND(KEYWORD(i, querypos=1))",
                                       "type": "AND"},
                                      {"children": [{"querypos": 2,
                                                      "type": "KEYWORD",
                                                      "word": "me"}],
                                       "description": "AND(KEYWORD(me, querypos=2))",
                                       "type": "AND"}],
                        "description": "AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2)))",
                        "type": "AND"}},
 "timed_out": False,
 "took": 0}
java
Java
query = new HashMap<String,Object>();
query.put("query_string","i me");
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
searchRequest.setProfile(true);
searchRequest.setLimit(1);
searchRequest.setSort(new ArrayList<String>(){{
    add("*");
}});
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 18
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=100, _score=2500, _source={}}]
        aggregations: null
    }
    profile: {query={type=AND, description=AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2))), children=[{type=AND, description=AND(KEYWORD(i, querypos=1)), children=[{type=KEYWORD, word=i, querypos=1}]}, {type=AND, description=AND(KEYWORD(me, querypos=2)), children=[{type=KEYWORD, word=me, querypos=2}]}]}}
}
C#
C#
object query =  new { query_string="i me" };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Profile = true;
searchRequest.Limit = 1;
searchRequest.Sort = new List<Object> { "*" };
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 18
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=100, _score=2500, _source={}}]
        aggregations: null
    }
    profile: {query={type=AND, description=AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2))), children=[{type=AND, description=AND(KEYWORD(i, querypos=1)), children=[{type=KEYWORD, word=i, querypos=1}]}, {type=AND, description=AND(KEYWORD(me, querypos=2)), children=[{type=KEYWORD, word=me, querypos=2}]}]}}
}
TypeScript
TypeScript
res = await searchApi.search({
  index: 'test',
  query: { query_string: 'Text' }, 
  _source: { excludes: ['*'] },
  limit: 1,
  profile: true
});
{
    "hits": 
    {
        "hits": 
        [{
            "_id": "1",
            "_score": 1480,
            "_source": {}
        }],
        "total": 1
    },
    "profile":
    {
        "query": {
            "children": 
            [{
                "children": 
                [{
                    "querypos": 1,
                    "type": "KEYWORD",
                    "word": "i"
                }],
                "description": "AND(KEYWORD(i, querypos=1))",
                "type": "AND"
            },
            {
                "children": 
                [{
                    "querypos": 2,
                    "type": "KEYWORD",
                    "word": "me"
                }],
                "description": "AND(KEYWORD(me, querypos=2))",
                "type": "AND"
            }],
            "description": "AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2)))",
            "type": "AND"
        }
    },
    "timed_out": False,
    "took": 0
}
Go
Go
searchRequest := manticoresearch.NewSearchRequest("test")
query := map[string]interface{} {"query_string": "Text"}
source := map[string]interface{} { "excludes": []string {"*"} }
searchRequest.SetQuery(query)
searchRequest.SetSource(source)
searchReq.SetLimit(1)
searchReq.SetProfile(true)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "hits": 
    {
        "hits": 
        [{
            "_id": "1",
            "_score": 1480,
            "_source": {}
        }],
        "total": 1
    },
    "profile":
    {
        "query": {
            "children": 
            [{
                "children": 
                [{
                    "querypos": 1,
                    "type": "KEYWORD",
                    "word": "i"
                }],
                "description": "AND(KEYWORD(i, querypos=1))",
                "type": "AND"
            },
            {
                "children": 
                [{
                    "querypos": 2,
                    "type": "KEYWORD",
                    "word": "me"
                }],
                "description": "AND(KEYWORD(me, querypos=2))",
                "type": "AND"
            }],
            "description": "AND( AND(KEYWORD(i, querypos=1)),  AND(KEYWORD(me, querypos=2)))",
            "type": "AND"
        }
    },
    "timed_out": False,
    "took": 0
}

In some instances, the evaluated query tree may significantly differ from the original one due to expansions and other transformations.

SQL
SET profiling=1;

SELECT id FROM forum WHERE MATCH('@title way* @content hey') LIMIT 1;

SHOW PLAN;
Query OK, 0 rows affected (0.00 sec)

+--------+
| id     |
+--------+
| 711651 |
+--------+
1 row in set (0.04 sec)

+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable         | Value                                                                                                                                                                                                                                                                                                                                                                                                                   |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| transformed_tree | AND(
  OR(
    OR(
      AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),
      OR(
        AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),
        AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),
    AND(fields=(title), KEYWORD(way, querypos=1, expanded)),
    OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),
  AND(fields=(content), KEYWORD(hey, querypos=2))) |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
JSON
POST /search
{
  "index": "forum",
  "query": {"query_string": "@title way* @content hey"},
  "_source": { "excludes":["*"] },
  "limit": 1,
  "profile":true
}
{
  "took":33,
  "timed_out":false,
  "hits":
  {
    "total":105,
    "hits":
    [
       {
          "_id":"711651",
          "_score":2539,
          "_source":{}
       }
    ]
  },
  "profile":
  {
    "query":
    {
      "type":"AND",
      "description":"AND( OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),  AND(fields=(title), KEYWORD(way, querypos=1, expanded)),  OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),  AND(fields=(content), KEYWORD(hey, querypos=2)))",
      "children":
      [
        {
          "type":"OR",
          "description":"OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),  AND(fields=(title), KEYWORD(way, querypos=1, expanded)),  OR(fields=(title), KEYWORD(way*, querypos=1, expanded)))",
          "children":
          [
            {
               "type":"OR",
               "description":"OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))))",
               "children":
               [
                 {
                   "type":"AND",
                   "description":"AND(fields=(title), KEYWORD(wayne, querypos=1, expanded))",
                   "fields":["title"],
                   "max_field_pos":0,
                   "children":
                   [
                     {
                       "type":"KEYWORD",
                       "word":"wayne",
                       "querypos":1,
                       "expanded":true
                     }
                   ]
                 },
                 {
                   "type":"OR",
                   "description":"OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))",
                   "children":
                   [
                     {
                       "type":"AND",
                       "description":"AND(fields=(title), KEYWORD(ways, querypos=1, expanded))",
                       "fields":["title"],
                       "max_field_pos":0,
                       "children":
                       [
                         {
                           "type":"KEYWORD",
                           "word":"ways",
                           "querypos":1,
                           "expanded":true
                         }
                       ]
                     },
                     {
                       "type":"AND",
                       "description":"AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))",
                       "fields":["title"],
                       "max_field_pos":0,
                       "children":
                       [
                         {
                           "type":"KEYWORD",
                           "word":"wayyy",
                           "querypos":1,
                           "expanded":true
                         }
                       ]
                     }
                   ]
                 }
               ]
            },
            {
              "type":"AND",
              "description":"AND(fields=(title), KEYWORD(way, querypos=1, expanded))",
              "fields":["title"],
              "max_field_pos":0,
              "children":
              [
                 {
                    "type":"KEYWORD",
                    "word":"way",
                    "querypos":1,
                    "expanded":true
                 }
              ]
            },
            {
              "type":"OR",
              "description":"OR(fields=(title), KEYWORD(way*, querypos=1, expanded))",
              "fields":["title"],
              "max_field_pos":0,
              "children":
              [
                {
                  "type":"KEYWORD",
                  "word":"way*",
                  "querypos":1,
                  "expanded":true
                }
              ]
            }
          ]
        },
        {
          "type":"AND",
          "description":"AND(fields=(content), KEYWORD(hey, querypos=2))",
          "fields":["content"],
          "max_field_pos":0,
          "children":
          [
            {
              "type":"KEYWORD",
              "word":"hey",
              "querypos":2
            }
          ]
        }
      ]
    }
  }
}
PHP
$result = $index->search('@title way* @content hey')->setSource(['excludes'=>['*']])->setLimit(1)->profile()->get();
print_r($result->getProfile());
Array
(
    [query] => Array
        (
            [type] => AND
            [description] => AND( OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),  AND(fields=(title), KEYWORD(way, querypos=1, expanded)),  OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),  AND(fields=(content), KEYWORD(hey, querypos=2)))
            [children] => Array
                (
                    [0] => Array
                        (
                            [type] => OR
                            [description] => OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),  AND(fields=(title), KEYWORD(way, querypos=1, expanded)),  OR(fields=(title), KEYWORD(way*, querypos=1, expanded)))
                            [children] => Array
                                (
                                    [0] => Array
                                        (
                                            [type] => OR
                                            [description] => OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))))
                                            [children] => Array
                                                (
                                                    [0] => Array
                                                        (
                                                            [type] => AND
                                                            [description] => AND(fields=(title), KEYWORD(wayne, querypos=1, expanded))
                                                            [fields] => Array
                                                                (
                                                                    [0] => title
                                                                )
                                                            [max_field_pos] => 0
                                                            [children] => Array
                                                                (
                                                                    [0] => Array
                                                                        (
                                                                            [type] => KEYWORD
                                                                            [word] => wayne
                                                                            [querypos] => 1
                                                                            [expanded] => 1
                                                                        )
                                                                )
                                                        )
                                                    [1] => Array
                                                        (
                                                            [type] => OR
                                                            [description] => OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))
                                                            [children] => Array
                                                                (
                                                                    [0] => Array
                                                                        (
                                                                            [type] => AND
                                                                            [description] => AND(fields=(title), KEYWORD(ways, querypos=1, expanded))
                                                                            [fields] => Array
                                                                                (
                                                                                    [0] => title
                                                                                )

                                                                            [max_field_pos] => 0
                                                                            [children] => Array
                                                                                (
                                                                                    [0] => Array
                                                                                        (
                                                                                            [type] => KEYWORD
                                                                                            [word] => ways
                                                                                            [querypos] => 1
                                                                                            [expanded] => 1
                                                                                        )
                                                                                )
                                                                        )
                                                                    [1] => Array
                                                                        (
                                                                            [type] => AND
                                                                            [description] => AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))
                                                                            [fields] => Array
                                                                                (
                                                                                    [0] => title
                                                                                )
                                                                            [max_field_pos] => 0
                                                                            [children] => Array
                                                                                (
                                                                                    [0] => Array
                                                                                        (
                                                                                            [type] => KEYWORD
                                                                                            [word] => wayyy
                                                                                            [querypos] => 1
                                                                                            [expanded] => 1
                                                                                        )
                                                                                )
                                                                        )
                                                                )
                                                        )
                                                )
                                        )
                                    [1] => Array
                                        (
                                            [type] => AND
                                            [description] => AND(fields=(title), KEYWORD(way, querypos=1, expanded))
                                            [fields] => Array
                                                (
                                                    [0] => title
                                                )
                                            [max_field_pos] => 0
                                            [children] => Array
                                                (
                                                    [0] => Array
                                                        (
                                                            [type] => KEYWORD
                                                            [word] => way
                                                            [querypos] => 1
                                                            [expanded] => 1
                                                        )
                                                )
                                        )
                                    [2] => Array
                                        (
                                            [type] => OR
                                            [description] => OR(fields=(title), KEYWORD(way*, querypos=1, expanded))
                                            [fields] => Array
                                                (
                                                    [0] => title
                                                )
                                            [max_field_pos] => 0
                                            [children] => Array
                                                (
                                                    [0] => Array
                                                        (
                                                            [type] => KEYWORD
                                                            [word] => way*
                                                            [querypos] => 1
                                                            [expanded] => 1
                                                        )
                                                )
                                        )
                                )
                        )
                    [1] => Array
                        (
                            [type] => AND
                            [description] => AND(fields=(content), KEYWORD(hey, querypos=2))
                            [fields] => Array
                                (
                                    [0] => content
                                )
                            [max_field_pos] => 0
                            [children] => Array
                                (
                                    [0] => Array
                                        (
                                            [type] => KEYWORD
                                            [word] => hey
                                            [querypos] => 2
                                        )
                                )
                        )
                )
        )
)
Python
Python
searchApi.search({"index":"forum","query":{"query_string":"@title way* @content hey"},"_source":{"excludes":["*"]},"limit":1,"profile":true})
{'hits': {'hits': [{u'_id': u'2811025403043381551',
                    u'_score': 2643,
                    u'_source': {}}],
          'total': 1},
 'profile': {u'query': {u'children': [{u'children': [{u'expanded': True,
                                                      u'querypos': 1,
                                                      u'type': u'KEYWORD',
                                                      u'word': u'way*'}],
                                       u'description': u'AND(fields=(title), KEYWORD(way*, querypos=1, expanded))',
                                       u'fields': [u'title'],
                                       u'type': u'AND'},
                                      {u'children': [{u'querypos': 2,
                                                      u'type': u'KEYWORD',
                                                      u'word': u'hey'}],
                                       u'description': u'AND(fields=(content), KEYWORD(hey, querypos=2))',
                                       u'fields': [u'content'],
                                       u'type': u'AND'}],
                        u'description': u'AND( AND(fields=(title), KEYWORD(way*, querypos=1, expanded)),  AND(fields=(content), KEYWORD(hey, querypos=2)))',
                        u'type': u'AND'}},
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.search({"index":"forum","query":{"query_string":"@title way* @content hey"},"_source":{"excludes":["*"]},"limit":1,"profile":true});
{"hits": {"hits": [{"_id": "2811025403043381551",
                    "_score": 2643,
                    "_source": {}}],
          "total": 1},
 "profile": {"query": {"children": [{"children": [{"expanded": True,
                                                      "querypos": 1,
                                                      "type": "KEYWORD",
                                                      "word": "way*"}],
                                       "description": "AND(fields=(title), KEYWORD(way*, querypos=1, expanded))",
                                       "fields": ["title"],
                                       "type": "AND"},
                                      {"children": [{"querypos": 2,
                                                      "type": "KEYWORD",
                                                      "word": "hey"}],
                                       "description": "AND(fields=(content), KEYWORD(hey, querypos=2))",
                                       "fields": ["content"],
                                       "type": "AND"}],
                        "description": "AND( AND(fields=(title), KEYWORD(way*, querypos=1, expanded)),  AND(fields=(content), KEYWORD(hey, querypos=2)))",
                        "type": "AND"}},
 "timed_out": False,
 "took": 0}
java
Java
query = new HashMap<String,Object>();
query.put("query_string","@title way* @content hey");
searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
searchRequest.setProfile(true);
searchRequest.setLimit(1);
searchRequest.setSort(new ArrayList<String>(){{
    add("*");
}});
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 18
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=2811025403043381551, _score=2643, _source={}}]
        aggregations: null
    }
    profile: {query={type=AND, description=AND( AND(fields=(title), KEYWORD(way*, querypos=1, expanded)),  AND(fields=(content), KEYWORD(hey, querypos=2))), children=[{type=AND, description=AND(fields=(title), KEYWORD(way*, querypos=1, expanded)), fields=[title], children=[{type=KEYWORD, word=way*, querypos=1, expanded=true}]}, {type=AND, description=AND(fields=(content), KEYWORD(hey, querypos=2)), fields=[content], children=[{type=KEYWORD, word=hey, querypos=2}]}]}}
}
C#
C#
object query =  new { query_string="@title way* @content hey" };
var searchRequest = new SearchRequest("forum", query);
searchRequest.Profile = true;
searchRequest.Limit = 1;
searchRequest.Sort = new List<Object> { "*" };
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 18
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=2811025403043381551, _score=2643, _source={}}]
        aggregations: null
    }
    profile: {query={type=AND, description=AND( AND(fields=(title), KEYWORD(way*, querypos=1, expanded)),  AND(fields=(content), KEYWORD(hey, querypos=2))), children=[{type=AND, description=AND(fields=(title), KEYWORD(way*, querypos=1, expanded)), fields=[title], children=[{type=KEYWORD, word=way*, querypos=1, expanded=true}]}, {type=AND, description=AND(fields=(content), KEYWORD(hey, querypos=2)), fields=[content], children=[{type=KEYWORD, word=hey, querypos=2}]}]}}
}
TypeScript
TypeScript
res = await searchApi.search({
  index: 'test',
  query: { query_string: '@content 1'},
  _source: { excludes: ["*"] },
  limit:1,
  profile":true
});
{
    "hits": 
    {
        "hits": 
        [{
            "_id": "1",
            "_score": 1480,
            "_source": {}
        }],
        "total": 1
    },
    "profile": 
    {
        "query": 
        {
            "children": 
            [{
                "children": 
                [{
                    "expanded": True,
                    "querypos": 1,
                    "type": "KEYWORD",
                    "word": "1*"
                }],
                "description": "AND(fields=(content), KEYWORD(1*, querypos=1, expanded))",
                "fields": ["content"],
                "type": "AND"
            }],
            "description": "AND(fields=(content), KEYWORD(1*, querypos=1))",
            "type": "AND"
        }},
    "timed_out": False,
    "took": 0
}
Go
Go
searchRequest := manticoresearch.NewSearchRequest("test")
query := map[string]interface{} {"query_string": "1*"}
source := map[string]interface{} { "excludes": []string {"*"} }
searchRequest.SetQuery(query)
searchRequest.SetSource(source)
searchReq.SetLimit(1)
searchReq.SetProfile(true)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "hits": 
    {
        "hits": 
        [{
            "_id": "1",
            "_score": 1480,
            "_source": {}
        }],
        "total": 1
    },
    "profile": 
    {
        "query": 
        {
            "children": 
            [{
                "children": 
                [{
                    "expanded": True,
                    "querypos": 1,
                    "type": "KEYWORD",
                    "word": "1*"
                }],
                "description": "AND(fields=(content), KEYWORD(1*, querypos=1, expanded))",
                "fields": ["content"],
                "type": "AND"
            }],
            "description": "AND(fields=(content), KEYWORD(1*, querypos=1))",
            "type": "AND"
        }},
    "timed_out": False,
    "took": 0
}

Profiling without running a query

The SQL statement EXPLAIN QUERY enables the display of the execution tree for a given full-text query without performing an actual search query on the table.

SQL
EXPLAIN QUERY index_base '@title running @body dog'\G
 EXPLAIN QUERY index_base '@title running @body dog'\G

*************************** 1\. row ***************************
Variable: transformed_tree
   Value: AND(
      OR(
            AND(fields=(title), KEYWORD(run, querypos=1, morphed)),
            AND(fields=(title), KEYWORD(running, querypos=1, morphed))))
  AND(fields=(body), KEYWORD(dog, querypos=2, morphed)))

EXPLAIN QUERY ... option format=dot allows displaying the execution tree of a provided full-text query in a hierarchical format suitable for visualization by existing tools, such as https://dreampuf.github.io/GraphvizOnline:

EXPLAIN QUERY graphviz example

SQL
EXPLAIN QUERY tbl 'i me' option format=dot\G
EXPLAIN QUERY tbl 'i me' option format=dot\G

*************************** 1. row ***************************
Variable: transformed_tree
   Value: digraph "transformed_tree"
{

0 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
0 -> 1
1 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
1 -> 2
2 [shape=record label="i | { querypos=1 }"]
0 -> 3
3 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
3 -> 4
4 [shape=record label="me | { querypos=2 }"]
}

Viewing the match factors values

When using an expression ranker, it's possible to reveal the values of the calculated factors with the PACKEDFACTORS() function.

The function returns:

These values can be utilized to understand why certain documents receive lower or higher scores in a search or to refine the existing ranking expression.

Example:

SQL
SELECT id, PACKEDFACTORS() FROM test1 WHERE MATCH('test one') OPTION ranker=expr('1')\G
             id: 1
packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
    field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
        min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
        min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
        exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
    word0=(tf=1, idf=-0.062982),
    word1=(tf=1, idf=0.215338)
1 row in set (0.00 sec)
¶ 13.2.5

Boolean optimization

Queries can be automatically optimized if OPTION boolean_simplify=1 is specified. Some transformations performed by this optimization include:

Queries like -dog, which could potentially include all documents from the collection are not allowed by default. To allow them, you must specify not_terms_only_allowed=1 either as a global setting or as a search option.

¶ 13.3

Search results

SQL

When you run a query via SQL over the MySQL protocol, you receive the requested columns as a result or an empty result set if nothing is found.

SQL
SELECT * FROM tbl;
+------+------+--------+
| id   | age  | name   |
+------+------+--------+
|    1 |   25 | joe    |
|    2 |   25 | mary   |
|    3 |   33 | albert |
+------+------+--------+
3 rows in set (0.00 sec)

Additionally, you can use the SHOW META call to see extra meta-information about the latest query.

SQL
SELECT id,story_author,comment_author FROM hn_small WHERE story_author='joe' LIMIT 3; SHOW META;
++--------+--------------+----------------+
| id     | story_author | comment_author |
+--------+--------------+----------------+
| 152841 | joe          | SwellJoe       |
| 161323 | joe          | samb           |
| 163735 | joe          | jsjenkins168   |
+--------+--------------+----------------+
3 rows in set (0.01 sec)

+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 3     |
| total_found    | 20    |
| total_relation | gte   |
| time           | 0.010 |
+----------------+-------+
4 rows in set (0.00 sec)

In some cases, such as when performing a faceted search, you may receive multiple result sets as a response to your SQL query.

SQL
SELECT * FROM tbl WHERE MATCH('joe') FACET age;
+------+------+
| id   | age  |
+------+------+
|    1 |   25 |
+------+------+
1 row in set (0.00 sec)

+------+----------+
| age  | count(*) |
+------+----------+
|   25 |        1 |
+------+----------+
1 row in set (0.00 sec)

In case of a warning, the result set will include a warning flag, and you can see the warning using SHOW WARNINGS.

SQL
SELECT * from tbl where match('"joe"/3'); show warnings;
+------+------+------+
| id   | age  | name |
+------+------+------+
|    1 |   25 | joe  |
+------+------+------+
1 row in set, 1 warning (0.00 sec)

+---------+------+--------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                    |
+---------+------+--------------------------------------------------------------------------------------------+
| warning | 1000 | quorum threshold too high (words=1, thresh=3); replacing quorum operator with AND operator |
+---------+------+--------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

If your query fails, you will receive an error:

SQL
SELECT * from tbl where match('@surname joe');
ERROR 1064 (42000): index idx: query error: no field 'surname' found in schema

HTTP

Via the HTTP JSON interface, the query result is sent as a JSON document. Example:

{
  "took":10,
  "timed_out": false,
  "hits":
  {
    "total": 2,
    "hits":
    [
      {
        "_id": "1",
        "_score": 1,
        "_source": { "gid": 11 }
      },
      {
        "_id": "2",
        "_score": 1,
        "_source": { "gid": 12 }
      }
    ]
  }
}

The query result can also include query profile information. See Query profile.

Each match in the hits array has the following properties:

Source selection

By default, all attributes are returned in the _source array. You can use the _source property in the request payload to select the fields you want to include in the result set. Example:

{
  "index":"test",
  "_source":"attr*",
  "query": { "match_all": {} }
}

You can specify the attributes you want to include in the query result as a string ("_source": "attr*") or as an array of strings ("_source": [ "attr1", "attri*" ]"). Each entry can be an attribute name or a wildcard (*, % and ? symbols are supported).

You can also explicitly specify which attributes you want to include and which to exclude from the result set using the includes and excludes properties:

"_source":
{
  "includes": [ "attr1", "attri*" ],
  "excludes": [ "*desc*" ]
}

An empty list of includes is interpreted as "include all attributes," while an empty list of excludes does not match anything. If an attribute matches both the includes and excludes, then the excludes win.

¶ 13.4

Filters

WHERE

WHERE is an SQL clause that works for both full-text matching and additional filtering. The following operators are available:

MATCH('query') is supported and maps to a full-text query.

The {col_name | expr_alias} [NOT] IN @uservar condition syntax is supported. Refer to the SET syntax for a description of global user variables.

HTTP JSON

If you prefer the HTTP JSON interface, you can also apply filtering. It might seem more complex than SQL, but it is recommended for cases when you need to prepare a query programmatically, such as when a user fills out a form in your application.

Here's an example of several filters in a bool query.

This full-text query matches all documents containing product in any field. These documents must have a price greater than or equal to 500 (gte) and less than or equal to 1000 (lte). All of these documents must not have a revision less than 15 (lt).

JSON
POST /search
{
  "index": "test1",
  "query": {
    "bool": {
      "must": [
        { "match" : { "_all" : "product" } },
        { "range": { "price": { "gte": 500, "lte": 1000 } } }
      ],
      "must_not": {
        "range": { "revision": { "lt": 15 } }
      }
    }
  }
}

bool query

The bool query matches documents based on boolean combinations of other queries and/or filters. Queries and filters must be specified in must, should, or must_not sections and can be nested.

JSON
POST /search
{
  "index":"test1",
  "query": {
    "bool": {
      "must": [
        { "match": {"_all":"keyword"} },
        { "range": { "revision": { "gte": 14 } } }
      ]
    }
  }
}

must

Queries and filters specified in the must section are required to match the documents. If multiple fulltext queries or filters are specified, all of them must match. This is the equivalent of AND queries in SQL. Note that if you want to match against an array (multi-value attribute), you can specify the attribute multiple times. The result will be positive only if all the queried values are found in the array, e.g.:

"must": [
  {"equals" : { "product_codes": 5 }},
  {"equals" : { "product_codes": 6 }}
]

Note also, it may be better in terms of performance to use:

  {"in" : { "all(product_codes)": [5,6] }}

(see details below).

should

Queries and filters specified in the should section should match the documents. If some queries are specified in must or must_not, should queries are ignored. On the other hand, if there are no queries other than should, then at least one of these queries must match a document for it to match the bool query. This is the equivalent of OR queries. Note, if you want to match against an array (multi-value attribute) you can specify the attribute multiple times, e.g.:

"should": [
  {"equals" : { "product_codes": 7 }},
  {"equals" : { "product_codes": 8 }}
]

Note also, it may be better in terms of performance to use:

  {"in" : { "any(product_codes)": [7,8] }}

(see details below).

must_not

Queries and filters specified in the must_not section must not match the documents. If several queries are specified under must_not, the document matches if none of them match.

JSON
POST /search
{
  "index":"t",
  "query": {
    "bool": {
      "should": [
        {
          "equals": {
            "b": 1
          }
        },
        {
          "equals": {
            "b": 3
          }
        }
      ],
      "must": [
        {
          "equals": {
            "a": 1
          }
        }
      ],
      "must_not": {
        "equals": {
          "b": 2
        }
      }
    }
  }
}

Nested bool query

A bool query can be nested inside another bool so you can make more complex queries. To make a nested boolean query just use another bool instead of must, should or must_not. Here is how this query:

a = 2 and (a = 10 or b = 0)

should be presented in JSON.

JSON
a = 2 and (a = 10 or b = 0)
POST /search
{
  "index":"t",
  "query": {
    "bool": {
      "must": [
        {
          "equals": {
            "a": 2
          }
        },
        {
          "bool": {
            "should": [
              {
                "equals": {
                  "a": 10
                }
              },
              {
                "equals": {
                  "b": 0
                }
              }
            ]
          }
        }
      ]
    }
  }
}

More complex query:

(a = 1 and b = 1) or (a = 10 and b = 2) or (b = 0)
JSON
(a = 1 and b = 1) or (a = 10 and b = 2) or (b = 0)
POST /search
{
  "index":"t",
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "equals": {
                  "a": 1
                }
              },
              {
                "equals": {
                  "b": 1
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "equals": {
                  "a": 10
                }
              },
              {
                "equals": {
                  "b": 2
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "equals": {
                  "b": 0
                }
              }
            ]
          }

        }
      ]
    }
  }
}

Queries in SQL format

Queries in SQL format (query_string) can also be used in bool queries.

JSON
POST /search
{
  "index": "test1",
  "query": {
    "bool": {
      "must": [
        { "query_string" : "product" },
        { "query_string" : "good" }
      ]
    }
  }
}

Various filters

Equality filters

Equality filters are the simplest filters that work with integer, float and string attributes.

JSON
POST /search
{
  "index":"test1",
  "query": {
    "equals": { "price": 500 }
  }
}

Filter equals can be applied to a multi-value attribute and you can use:

JSON
POST /search
{
  "index":"test1",
  "query": {
    "equals": { "any(price)": 100 }
  }
}

Set filters

Set filters check if attribute value is equal to any of the values in the specified set.

Set filters support integer, string and multi-value attributes.

JSON
POST /search
{
  "index":"test1",
  "query": {
    "in": {
      "price": [1,10,100]
    }
  }
}

When applied to a multi-value attribute you can use:

JSON
POST /search
{
  "index":"test1",
  "query": {
    "in": {
      "all(price)": [1,10]
    }
  }
}

Range filters

Range filters match documents that have attribute values within a specified range.

Range filters support the following properties:

JSON
POST /search
{
  "index":"test1",
  "query": {
    "range": {
      "price": {
        "gte": 500,
        "lte": 1000
      }
    }
  }
}

Geo distance filters

geo_distance filters are used to filter the documents that are within a specific distance from a geo location.

Specifies the pin location, in degrees. Distances are calculated from this point.

Specifies the attributes that contain latitude and longitude.

Specifies distance calculation function. Can be either adaptive or haversine. adaptive is faster and more precise, for more details see GEODIST(). Optional, defaults to adaptive.

Specifies the maximum distance from the pin locations. All documents within this distance match. The distance can be specified in various units. If no unit is specified, the distance is assumed to be in meters. Here is a list of supported distance units:

location_anchor and location_source properties accept the following latitude/longitude formats:

Latitude and longitude are specified in degrees.

Basic example
POST /search
{
  "index":"test",
  "query": {
    "geo_distance": {
      "location_anchor": {"lat":49, "lon":15},
      "location_source": {"attr_lat, attr_lon"},
      "distance_type": "adaptive",
      "distance":"100 km"
    }
  }
}
Advanced example
`geo_distance` can be used as a filter in bool queries along with matches or other attribute filters.
POST /search
{
  "index": "geodemo",
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "*": "station"
          }
        },
        {
          "equals": {
            "state_code": "ENG"
          }
        },
        {
          "geo_distance": {
            "distance_type": "adaptive",
            "location_anchor": {
              "lat": 52.396,
              "lon": -1.774
            },
            "location_source": "latitude_deg,longitude_deg",
            "distance": "10000 m"
          }
        }
      ]
    }
  }
}
¶ 13.5

Expressions in search

Manticore enables the use of arbitrary arithmetic expressions through both SQL and HTTP, incorporating attribute values, internal attributes (document ID and relevance weight), arithmetic operations, several built-in functions, and user-defined functions. Below is the complete reference list for quick access.

Arithmetic operators

+, -, *, /, %, DIV, MOD

Standard arithmetic operators are available. Arithmetic calculations involving these operators can be executed in three different modes:

  1. using single-precision, 32-bit IEEE 754 floating point values (default),
  2. using signed 32-bit integers,
  3. using 64-bit signed integers.

The expression parser automatically switches to integer mode if no operations result in a floating point value. Otherwise, it uses the default floating point mode. For example, a+b will be computed using 32-bit integers if both arguments are 32-bit integers; or using 64-bit integers if both arguments are integers but one of them is 64-bit; or in floats otherwise. However, a/b or sqrt(a) will always be computed in floats, as these operations return a non-integer result. To avoid this, you can use IDIV(a,b) or a DIV b form. Additionally, a*b will not automatically promote to 64-bit when arguments are 32-bit. To enforce 64-bit results, use BIGINT(), but note that if non-integer operations are present, BIGINT() will simply be ignored.

Comparison operators

<, > <=, >=, =, <>

The comparison operators return 1.0 when the condition is true and 0.0 otherwise. For example, (a=b)+3 evaluates to 4 when attribute a is equal to attribute b, and to 3 when a is not. Unlike MySQL, the equality comparisons (i.e., = and <> operators) include a small equality threshold (1e-6 by default). If the difference between the compared values is within the threshold, they are considered equal.

The BETWEEN and IN operators, in the case of multi-value attributes, return true if at least one value matches the condition (similar to ANY()). The IN operator does not support JSON attributes. The IS (NOT) NULL operator is supported only for JSON attributes.

Boolean operators

AND, OR, NOT

Boolean operators (AND, OR, NOT) behave as expected. They are left-associative and have the lowest priority compared to other operators. NOT has higher priority than AND and OR but still less than any other operator. AND and OR share the same priority, so using parentheses is recommended to avoid confusion in complex expressions.

Bitwise operators

&, |

These operators perform bitwise AND and OR respectively. The operands must be of integer types.

Functions:

Expressions in HTTP JSON

In the HTTP JSON interface, expressions are supported via script_fields and expressions.

script_fields

{
    "index": "test",
    "query": {
        "match_all": {}
    }, "script_fields": {
        "add_all": {
            "script": {
                "inline": "( gid * 10 ) | crc32(title)"
            }
        },
        "title_len": {
            "script": {
                "inline": "crc32(title)"
            }
        }
    }
}

In this example, two expressions are created: add_all and title_len. The first expression calculates ( gid * 10 ) | crc32(title) and stores the result in the add_all attribute. The second expression calculates crc32(title) and stores the result in the title_len attribute.

Currently, only inline expressions are supported. The value of the inline property (the expression to compute) has the same syntax as SQL expressions.

The expression name can be utilized in filtering or sorting.

script_fields
{
    "index":"movies_rt",
    "script_fields":{
        "cond1":{
            "script":{
                "inline":"actor_2_facebook_likes =296 OR movie_facebook_likes =37000"
            }
        },
        "cond2":{
            "script":{
                "inline":"IF (IN (content_rating,'TV-PG','PG'),2, IF(IN(content_rating,'TV-14','PG-13'),1,0))"
            }
        }
    },
    "limit":10,
    "sort":[
        {
            "cond2":"desc"
        },
        {
            "actor_1_name":"asc"
        },
        {
            "actor_2_name":"desc"
        }
    ],
    "profile":true,
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "*":"star"
                    }
                },
                {
                    "equals":{
                        "cond1":1
                    }
                }
            ],
            "must_not":[
                {
                    "equals":{
                        "content_rating":"R"
                    }
                }
            ]
        }
    }
}

By default, expression values are included in the _source array of the result set. If the source is selective (see Source selection), the expression name can be added to the _source parameter in the request. Note, the names of the expressions must be in lowercase.

expressions

expressions is an alternative to script_fields with a simpler syntax. The example request adds two expressions and stores the results into add_all and title_len attributes. Note, the names of the expressions must be in lowercase.

expressions
{
  "index": "test",
  "query": { "match_all": {} },
  "expressions":
  {
      "add_all": "( gid * 10 ) | crc32(title)",
      "title_len": "crc32(title)"
  }
}
¶ 13.6

Search options

The SQL SELECT clause and the HTTP /search endpoint support a number of options that can be used to fine-tune search behavior.

OPTION

General syntax

SQL:

SELECT ... [OPTION <optionname>=<value> [ , ... ]] [/*+ [NO_][ColumnarScan|DocidIndex|SecondaryIndex(<attribute>[,...])]] /*]

HTTP:

POST /search
{   
    "index" : "index_name",
    "options":   
    {
        "optionname": "value",
        "optionname2": <value2>
    }
}

SQL:

SQL
SELECT * FROM test WHERE MATCH('@title hello @body world')
OPTION ranker=bm25, max_matches=3000,
field_weights=(title=10, body=3), agent_query_timeout=10000
+------+-------+-------+
| id   | title | body  |
+------+-------+-------+
|    1 | hello | world |
+------+-------+-------+
1 row in set (0.00 sec)
JSON:
JSON
POST /search
{   
    "index" : "test",
    "query": {
      "match": {
        "title": "hello"
      },
      "match": {
        "body": "world"     
      }
    },
    "options":   
    {
        "ranker": "bm25",
        "max_matches": 3000,
        "field_weights": {
            "title": 10,
            "body": 3
        },
        "agent_query_timeout": 10000
    }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "total_relation": "eq",
    "hits": [
      {
        "_id": "1",
        "_score": 10500,
        "_source": {
          "title": "hello",
          "body": "world"
        }
      }
    ]
  }
}

Supported options are:

accurate_aggregation

Integer. Enables or disables guaranteed aggregate accuracy when running groupby queries in multiple threads. Default is 0.

When running a groupby query, it can be run in parallel on a plain table with several pseudo shards (if pseudo_sharding is on). A similar approach works on RT tables. Each shard/chunk executes the query, but the number of groups is limited by max_matches. If the result sets from different shards/chunks have different groups, the group counts and aggregates may be inaccurate. Note that Manticore tries to increase max_matches up to max_matches_increase_threshold based on the number of unique values of the groupby attribute (retrieved from secondary indexes). If it succeeds, there will be no loss in accuracy.

However, if the number of unique values of the groupby attribute is high, further increasing max_matches may not be a good strategy because it can lead to a loss in performance and higher memory usage. Setting accurate_aggregation to 1 forces groupby searches to run in a single thread, which fixes the accuracy issue. Note that running in a single thread is only enforced when max_matches cannot be set high enough; otherwise, searches with accurate_aggregation=1 will still run in multiple threads.

Overall, setting accurate_aggregation to 1 ensures group count and aggregate accuracy in RT tables and plain tables with pseudo_sharding=1. The drawback is that searches will run slower since they will be forced to operate in a single thread.

However, if we have an RT table and a plain table containing the same data, and we run a query with accurate_aggregation=1, we might still receive different results. This occurs because the daemon might choose different max_matches settings for the RT and plain table due to the max_matches_increase_threshold setting.

agent_query_timeout

Integer. Max time in milliseconds to wait for remote queries to complete, see this section.

boolean_simplify

0 or 1 (0 by default). boolean_simplify=1 enables simplifying the query to speed it up.

comment

String, user comment that gets copied to a query log file.

cutoff

Integer. Max found matches threshold. The value is selected automatically if not specified.

In case Manticore cannot calculate the exact matching documents count, you will see total_relation: gte in the query meta information, which means that the actual count is Greater Than or Equal to the total (total_found in SHOW META via SQL, hits.total in JSON via HTTP). If the total value is precise, you'll get total_relation: eq.

distinct_precision_threshold

Integer. Default is 3500. This option sets the threshold below which counts returned by count distinct are guaranteed to be exact within a plain table.

Accepted values range from 500 to 15500. Values outside this range will be clamped.

When this option is set to 0, it enables an algorithm that ensures exact counts. This algorithm collects {group, value} pairs, sorts them, and periodically eliminates duplicates. The result is precise counts within a plain table. However, this approach is not suitable for high-cardinality datasets due to its high memory consumption and slow query execution.

When distinct_precision_threshold is set to a value greater than 0, Manticore employs a different algorithm. It loads counts into a hash table and returns the size of the table. If the hash table becomes too large, its contents are moved into a HyperLogLog data structure. At this point, the counts become approximate because HyperLogLog is a probabilistic algorithm. This approach maintains a fixed maximum memory usage per group, but there is a tradeoff in count accuracy.

The accuracy of the HyperLogLog and the threshold for converting from the hash table to HyperLogLog are derived from the distinct_precision_threshold setting. It's important to use this option with caution since doubling its value will also double the maximum memory required to calculate counts. The maximum memory usage can be roughly estimated using this formula: 64 * max_matches * distinct_precision_threshold, although in practice, count calculations often use less memory than the worst-case scenario.

expand_keywords

0 or 1 (0 by default). Expands keywords with exact forms and/or stars when possible. Refer to expand_keywords for more details.

field_weights

Named integer list (per-field user weights for ranking).

Example:

SELECT ... OPTION field_weights=(title=10, body=3)

global_idf

Use global statistics (frequencies) from the global_idf file for IDF computations.

idf

Quoted, comma-separated list of IDF computation flags. Known flags are:

The historically default IDF (Inverse Document Frequency) in Manticore is equivalent to OPTION idf='normalized,tfidf_normalized', and those normalizations may cause several undesired effects.

First, idf=normalized causes keyword penalization. For instance, if you search for the | something and the occurs in more than 50% of the documents, then documents with both keywords the and something will get less weight than documents with just one keyword something. Using OPTION idf=plain avoids this. Plain IDF varies in [0, log(N)] range, and keywords are never penalized; while the normalized IDF varies in [-log(N), log(N)] range, and too frequent keywords are penalized.

Second, idf=tfidf_normalized leads to IDF drift across queries. Historically, IDF was also divided by the query keyword count, ensuring the entire sum(tf*idf) across all keywords remained within the [0,1] range. However, this meant that queries like word1 and word1 | nonmatchingword2 would assign different weights to the exact same result set, as the IDFs for both word1 and nonmatchingword2 would be divided by 2. Using OPTION idf='tfidf_unnormalized' resolves this issue. Keep in mind that BM25, BM25A, BM25F() ranking factors will be adjusted accordingly when you disable this normalization.

IDF flags can be combined; plain and normalized are mutually exclusive; tfidf_unnormalized and tfidf_normalized are also mutually exclusive; and unspecified flags in such mutually exclusive groups default to their original settings. This means OPTION idf=plain is the same as specifying OPTION idf='plain,tfidf_normalized' in its entirety.

index_weights

Named integer list. Per-table user weights for ranking.

local_df

0 or 1, automatically sum DFs over all local parts of a distributed table, ensuring consistent (and accurate) IDF across a locally sharded table. Enabled dy default for disk chunks of the RT table.

Low Priority

0 or 1 (0 by default). Setting low_priority=1 executes the query with a lower priority, rescheduling its jobs 10 times less frequently than other queries with normal priority.

max_matches

Integer. Per-query max matches value.

The maximum number of matches that the server retains in RAM for each table and can return to the client. The default is 1000.

Introduced to control and limit RAM usage, the max_matches setting determines how many matches will be kept in RAM while searching each table. Every match found is still processed, but only the best N of them will be retained in memory and returned to the client in the end. For example, suppose a table contains 2,000,000 matches for a query. It's rare that you would need to retrieve all of them. Instead, you need to scan all of them but only choose the "best" 500, for instance, based on some criteria (e.g., sorted by relevance, price, or other factors) and display those 500 matches to the end user in pages of 20 to 100 matches. Tracking only the best 500 matches is much more RAM and CPU efficient than keeping all 2,000,000 matches, sorting them, and then discarding everything but the first 20 needed for the search results page. max_matches controls the N in that "best N" amount.

This parameter significantly impacts per-query RAM and CPU usage. Values of 1,000 to 10,000 are generally acceptable, but higher limits should be used with caution. Carelessly increasing max_matches to 1,000,000 means that searchd will have to allocate and initialize a 1-million-entry matches buffer for every query. This will inevitably increase per-query RAM usage and, in some cases, can noticeably affect performance.

Refer to max_matches_increase_threshold for additional information on how it can influence the behavior of the max_matches option.

max_matches_increase_threshold

Integer. Sets the threshold that max_matches can be increased to. Default is 16384.

Manticore may increase max_matches to enhance groupby and/or aggregation accuracy when pseudo_sharding is enabled, and if it detects that the number of unique values of the groupby attribute is less than this threshold. Loss of accuracy may occur when pseudo-sharding executes the query in multiple threads or when an RT table conducts parallel searches in disk chunks.

If the number of unique values of the groupby attribute is less than the threshold, max_matches will be set to this number. Otherwise, the default max_matches will be used.

If max_matches was explicitly set in query options, this threshold has no effect.

Keep in mind that if this threshold is set too high, it will result in increased memory consumption and general performance degradation.

You can also enforce a guaranteed groupby/aggregate accuracy mode using the accurate_aggregation option.

max_query_time

Sets the maximum search query time in milliseconds. Must be a non-negative integer. The default value is 0, which means "do not limit." Local search queries will be stopped once the specified time has elapsed. Note that if you're performing a search that queries multiple local tables, this limit applies to each table separately. Be aware that this may slightly increase the query's response time due to the overhead caused by constantly tracking whether it's time to stop the query.

max_predicted_time

Integer. Maximum predicted search time; see predicted_time_costs.

morphology

none allows replacing all query terms with their exact forms if the table was built with index_exact_words enabled. This is useful for preventing stemming or lemmatizing query terms.

not_terms_only_allowed

0 or 1 allows standalone negation for the query. The default is 0. See also the corresponding global setting.

SQL
MySQL [(none)]> select * from tbl where match('-donald');
ERROR 1064 (42000): index t: query error: query is non-computable (single NOT operator)
MySQL [(none)]> select * from t where match('-donald') option not_terms_only_allowed=1;
+---------------------+-----------+
| id                  | field     |
+---------------------+-----------+
| 1658178727135150081 | smth else |
+---------------------+-----------+

ranker

Choose from the following options:

For more details on each ranker, refer to Search results ranking.

rand_seed

Allows you to specify a specific integer seed value for an ORDER BY RAND() query, for example: ... OPTION rand_seed=1234. By default, a new and different seed value is autogenerated for every query.

retry_count

Integer. Distributed retries count.

retry_delay

Integer. Distributed retry delay, in milliseconds.

sort_method

threads

Limits the max number of threads used for current query processing. Default - no limit (the query can occupy all threads as defined globally).
For a batch of queries, the option must be attached to the very first query in the batch, and it is then applied when the working queue is created and is effective for the entire batch. This option has the same meaning as the option max_threads_per_query, but is applied only to the current query or batch of queries.

token_filter

Quoted, colon-separated string of library name:plugin name:optional string of settings. A query-time token filter is created for each search when full-text is invoked by every table involved, allowing you to implement a custom tokenizer that generates tokens according to custom rules.

SELECT * FROM index WHERE MATCH ('yes@no') OPTION token_filter='mylib.so:blend:@'

expansion_limit

Restricts the maximum number of expanded keywords for a single wildcard, with a default value of 0 indicating no limit. For additional details, refer to expansion_limit.

Query optimizer hints

In rare cases, Manticore's built-in query analyzer may be incorrect in understanding a query and determining whether a docid index, secondary indexes, or columnar scan should be used. To override the query optimizer's decisions, you can use the following hints in your query:

Note that when executing a full-text query with filters, the query optimizer decides between intersecting the results of the full-text tree with the filter results or using a standard match-then-filter approach. Specifying any hint will force the daemon to use the code path that performs the intersection of the full-text tree results with the filter results.

For more information on how the query optimizer works, refer to the Cost based optimizer page.

SQL
SELECT * FROM students where age > 21 /*+ SecondaryIndex(age) */

When using a MySQL/MariaDB client, make sure to include the --comments flag to enable the hints in your queries.

mysql
mysql -P9306 -h0 --comments
¶ 13.7

Highlighting

Highlighting enables you to obtain highlighted text fragments (referred to as snippets) from documents containing matching keywords.

The SQL HIGHLIGHT() function, the "highlight" property in JSON queries via HTTP, and the highlight() function in the PHP client all utilize the built-in document storage to retrieve the original field contents (enabled by default).

SQL
SELECT HIGHLIGHT() FROM books WHERE MATCH('try');
+----------------------------------------------------------+
| highlight()                                              |
+----------------------------------------------------------+
| Don`t <b>try</b> to compete in childishness, said Bliss. |
+----------------------------------------------------------+
1 row in set (0.00 sec)
JSON
POST /search
{
  "index": "books",
  "query":  {  "match": { "*" : "try" }  },
  "highlight": {}
}
{
  "took":1,
  "timed_out":false,
  "hits":
  {
    "total":1,
    "hits":
    [
      {
        "_id":"4",
        "_score":1704,
        "_source":
        {
          "title":"Book four",
          "content":"Don`t try to compete in childishness, said Bliss."
        },
        "highlight":
        {
          "title": ["Book four"],
          "content": ["Don`t <b>try</b> to compete in childishness, said Bliss."]
        }
      }
    ]
  }
}
PHP
$results = $index->search('try')->highlight()->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId();
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.': '.$value;
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 14
title: Book four
content: Don`t try to compete in childishness, said Bliss.
Highlight for title:
- Book four
Highlight for content:
- Don`t <b>try</b> to compete in childishness, said Bliss.
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"try"}},"highlight":{}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'4',
                    u'_score': 1695,
                    u'_source': {u'content': u'Don`t try to compete in childishness, said Bliss.',
                                 u'title': u'Book four'},
                    u'highlight': {u'content': [u'Don`t <b>try</b> to compete in childishness, said Bliss.'],
                                   u'title': [u'Book four']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"try"}},"highlight":{}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"4","_score":1695,"_source":{"title":"Book four","content":"Don`t try to compete in childishness, said Bliss."},"highlight":{"title":["Book four"],"content":["Don`t <b>try</b> to compete in childishness, said Bliss."]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","try|gets|down|said");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{

}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 3
        maxScore: null
        hits: [{_id=3, _score=1597, _source={title=Book three, content=Trevize whispered, "It gets infantile pleasure out of display. I`d love to knock it down."}, highlight={title=[Book three], content=[, "It <b>gets</b> infantile pleasure ,  to knock it <b>down</b>."]}}, {_id=4, _score=1563, _source={title=Book four, content=Don`t try to compete in childishness, said Bliss.}, highlight={title=[Book four], content=[Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.]}}, {_id=5, _score=1514, _source={title=Books two, content=A door opened before them, revealing a small room. Bander said, "Come, half-humans, I want to show you how we live."}, highlight={title=[Books two], content=[ a small room. Bander <b>said</b>, "Come, half-humans, I]}}]
        aggregations: null
    }
    profile: null
}
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "try|gets|down|said");
var highlight = new Highlight();
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 3
        maxScore: null
        hits: [{_id=3, _score=1597, _source={title=Book three, content=Trevize whispered, "It gets infantile pleasure out of display. I`d love to knock it down."}, highlight={title=[Book three], content=[, "It <b>gets</b> infantile pleasure ,  to knock it <b>down</b>."]}}, {_id=4, _score=1563, _source={title=Book four, content=Don`t try to compete in childishness, said Bliss.}, highlight={title=[Book four], content=[Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.]}}, {_id=5, _score=1514, _source={title=Books two, content=A door opened before them, revealing a small room. Bander said, "Come, half-humans, I want to show you how we live."}, highlight={title=[Books two], content=[ a small room. Bander <b>said</b>, "Come, half-humans, I]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: {}
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1"
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1"
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}

When using SQL for highlighting search results, you will receive snippets from various fields combined into a single string due to the limitations of the MySQL protocol. You can adjust the concatenation separators with the field_separator and snippet_separator options, as detailed below.

When executing JSON queries through HTTP or using the PHP client, there are no such constraints, and the result set includes an array of fields containing arrays of snippets (without separators).

Keep in mind that snippet generation options like limit, limit_words, and limit_snippets apply to each field individually by default. You can alter this behavior using the limits_per_field option, but it could lead to unwanted results. For example, one field may have matching keywords, but no snippets from that field are included in the result set because they didn't rank as high as snippets from other fields in the highlighting engine.

The highlighting algorithm currently prioritizes better snippets (with closer phrase matches) and then snippets with keywords not yet included in the result. Generally, it aims to highlight the best match for the query and to highlight all query keywords, as allowed by the limits. If no matches are found in the current field, the beginning of the document will be trimmed according to the limits and returned by default. To return an empty string instead, set the allow_empty option to 1.

Highlighting is performed during the so-called post limit stage, which means that snippet generation is deferred not only until the entire final result set is prepared but also after the LIMIT clause is applied. For instance, with a LIMIT 20,10 clause, the HIGHLIGHT() function will be called a maximum of 10 times.

Highlighting options

There are several optional highlighting options that can be used to fine-tune snippet generation, which are common to SQL, HTTP, and PHP clients.

before_match

A string to insert before a keyword match. The %SNIPPET_ID% macro can be used in this string. The first occurrence of the macro is replaced with an incrementing snippet number within the current snippet. Numbering starts at 1 by default but can be overridden with the start_snippet_id option. %SNIPPET_ID% restarts at the beginning of each new document. The default is <b>.

after_match

A string to insert after a keyword match. The default is </b>.

limit

The maximum snippet size, in symbols (codepoints). The default is 256. This is applied per-field by default, see limits_per_field.

limit_words

Limits the maximum number of words that can be included in the result. Note that this limit applies to all words, not just the matched keywords to highlight. For example, if highlighting Mary and a snippet Mary had a little lamb is selected, it contributes 5 words to this limit, not just 1. The default is 0 (no limit). This is applied per-field by default, see limits_per_field.

limit_snippets

Limits the maximum number of snippets that can be included in the result. The default is 0 (no limit). This is applied per-field by default, see limits_per_field.

limits_per_field

Determines whether limit, limit_words, and limit_snippets operate as individual limits in each field of the document being highlighted or as global limits for the entire document. Setting this option to 0 means that all combined highlighting results for one document must be within the specified limits. The downside is that you may have several snippets highlighted in one field and none in another if the highlighting engine decides they are more relevant. The default is 1 (use per-field limits).

around

The number of words to select around each matching keyword block. The default is 5.

use_boundaries

Determines whether to additionally break snippets by phrase boundary characters, as configured in table settings with the phrase_boundary directive. The default is 0 (don't use boundaries).

weight_order

Specifies whether to sort the extracted snippets in order of relevance (decreasing weight) or in order of appearance in the document (increasing position). The default is 0 (don't use weight order).

force_all_words

Ignores the length limit until the result includes all keywords. The default is 0 (don't force all keywords).

start_snippet_id

Sets the starting value of the %SNIPPET_ID% macro (which is detected and expanded in before_match, after_match strings). The default is 1.

html_strip_mode

Defines the HTML stripping mode setting. Defaults to index, meaning that table settings will be used. Other values include none and strip, which forcibly skip or apply stripping regardless of table settings; and retain, which retains HTML markup and protects it from highlighting. The retain mode can only be used when highlighting full documents and therefore requires that no snippet size limits are set. The allowed string values are none, strip, index, and retain.

allow_empty

Permits an empty string to be returned as the highlighting result when no snippets could be generated in the current field (no keyword match or no snippets fit the limit). By default, the beginning of the original text would be returned instead of an empty string. The default is 0 (don't allow an empty result).

snippet_boundary

Ensures that snippets do not cross a sentence, paragraph, or zone boundary (when used with a table that has the respective indexing settings enabled). The allowed values are sentence, paragraph, and zone.

emit_zones

Emits an HTML tag with the enclosing zone name before each snippet. The default is 0 (don't emit zone names).

force_snippets

Determines whether to force snippet generation even if limits allow highlighting the entire text. The default is 0 (don't force snippet generation).

SQL
SELECT HIGHLIGHT({limit=50}) FROM books WHERE MATCH('try|gets|down|said');
+---------------------------------------------------------------------------+
| highlight({limit=50})                                                     |
+---------------------------------------------------------------------------+
|  ... , "It <b>gets</b> infantile pleasure  ...  to knock it <b>down</b>." |
| Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.           |
|  ...  a small room. Bander <b>said</b>, "Come, half-humans, I ...         |
+---------------------------------------------------------------------------+
3 rows in set (0.00 sec)
JSON
POST /search
{
  "index": "books",
  "query": {"query_string": "try|gets|down|said"},
  "highlight": { "limit":50 }
}
{
  "took":2,
  "timed_out":false,
  "hits":
  {
    "total":3,
    "hits":
    [
      {
        "_id":"3",
        "_score":1602,
        "_source":
        {
          "title":"Book three",
          "content":"Trevize whispered, \"It gets infantile pleasure out of display. I`d love to knock it down.\""
        },
        "highlight":
        {
          "title":
          [
            "Book three"
          ],
          "content":
          [
            ", \"It <b>gets</b> infantile pleasure ",
            " to knock it <b>down</b>.\""
          ]
        }
      },
      {
        "_id":"4",
        "_score":1573,
        "_source":
        {
          "title":"Book four",
          "content":"Don`t try to compete in childishness, said Bliss."
        },
        "highlight":
        {
          "title":
          [
            "Book four"
          ],
          "content":
          [
            "Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss."
          ]
        }
      },
      {
        "_id":"2",
        "_score":1521,
        "_source":
        {
          "title":"Book two",
          "content":"A door opened before them, revealing a small room. Bander said, \"Come, half-humans, I want to show you how we live.\""
        },
        "highlight":
        {
          "title":
          [
            "Book two"
          ],
          "content":
          [
            " a small room. Bander <b>said</b>, \"Come, half-humans, I"
          ]
        }
      }
    ]
  }
}
PHP
$results = $index->search('try|gets|down|said')->highlight([],['limit'=>50])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId();
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.': '.$value;
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo  $snippet."\n";
        }
    }
}
Document: 3
title: Book three
content: Trevize whispered, "It gets infantile pleasure out of display. I`d love to knock it down."
Highlight for title:
- Book four
Highlight for content:
, "It <b>gets</b> infantile pleasure
to knock it <b>down</b>."

Document: 4
title: Book four
content: Don`t try to compete in childishness, said Bliss.
Highlight for title:
- Book four
Highlight for content:
Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.

Document: 2
title: Book two
content: A door opened before them, revealing a small room. Bander said, "Come, half-humans, I want to show you how we live.
Highlight for title:
- Book two
Highlight for content:
 a small room. Bander <b>said</b>, \"Come, half-humans, I
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"try"}},"highlight":{"limit":50}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'4',
                    u'_score': 1695,
                    u'_source': {u'content': u'Don`t try to compete in childishness, said Bliss.',
                                 u'title': u'Book four'},
                    u'highlight': {u'content': [u'Don`t <b>try</b> to compete in childishness, said Bliss.'],
                                   u'title': [u'Book four']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"query_string":"try|gets|down|said"},"highlight":{"limit":50}});
{"took":0,"timed_out":false,"hits":{"total":3,"hits":[{"_id":"3","_score":1597,"_source":{"title":"Book three","content":"Trevize whispered, \"It gets infantile pleasure out of display. I`d love to knock it down.\""},"highlight":{"title":["Book three"],"content":[", \"It <b>gets</b> infantile pleasure "," to knock it <b>down</b>.\""]}},{"_id":"4","_score":1563,"_source":{"title":"Book four","content":"Don`t try to compete in childishness, said Bliss."},"highlight":{"title":["Book four"],"content":["Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss."]}},{"_id":"5","_score":1514,"_source":{"title":"Books two","content":"A door opened before them, revealing a small room. Bander said, \"Come, half-humans, I want to show you how we live.\""},"highlight":{"title":["Books two"],"content":[" a small room. Bander <b>said</b>, \"Come, half-humans, I"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","try|gets|down|said");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("limit",50);
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 3
        maxScore: null
        hits: [{_id=3, _score=1597, _source={title=Book three, content=Trevize whispered, "It gets infantile pleasure out of display. I`d love to knock it down."}, highlight={title=[Book three], content=[, "It <b>gets</b> infantile pleasure ,  to knock it <b>down</b>."]}}, {_id=4, _score=1563, _source={title=Book four, content=Don`t try to compete in childishness, said Bliss.}, highlight={title=[Book four], content=[Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.]}}, {_id=5, _score=1514, _source={title=Books two, content=A door opened before them, revealing a small room. Bander said, "Come, half-humans, I want to show you how we live."}, highlight={title=[Books two], content=[ a small room. Bander <b>said</b>, "Come, half-humans, I]}}]
        aggregations: null
    }
    profile: null
}
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "try|gets|down|said");
var highlight = new Highlight();
highlight.Limit = 50;
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 3
        maxScore: null
        hits: [{_id=3, _score=1597, _source={title=Book three, content=Trevize whispered, "It gets infantile pleasure out of display. I`d love to knock it down."}, highlight={title=[Book three], content=[, "It <b>gets</b> infantile pleasure ,  to knock it <b>down</b>."]}}, {_id=4, _score=1563, _source={title=Book four, content=Don`t try to compete in childishness, said Bliss.}, highlight={title=[Book four], content=[Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.]}}, {_id=5, _score=1514, _source={title=Books two, content=A door opened before them, revealing a small room. Bander said, "Come, half-humans, I want to show you how we live."}, highlight={title=[Books two], content=[ a small room. Bander <b>said</b>, "Come, half-humans, I]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
res = await searchApi.search({
  index: 'test',
  query: { match: { *: 'Text } },
  highlight: { limit: 2}
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":2,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        },
        {
            "_id":"2",
            "_score":1480,
            "_source":
            {
                "content":"Text 2",
                "name":"Doc 2",
                "cat":2
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 2</b>"
                ]
            }
        }]
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":2,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        },
        {
            "_id":"2",
            "_score":1480,
            "_source":
            {
                "content":"Text 2",
                "name":"Doc 2",
                "cat":2
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 2</b>"
                ]
            }
        }]
    }
}

Highlighting via SQL

The HIGHLIGHT() function can be used to highlight search results. Here's the syntax:

HIGHLIGHT([options], [field_list], [query] )

By default, it works with no arguments.

SQL
SELECT HIGHLIGHT() FROM books WHERE MATCH('before');
+-----------------------------------------------------------+
| highlight()                                               |
+-----------------------------------------------------------+
| A door opened <b>before</b> them, revealing a small room. |
+-----------------------------------------------------------+
1 row in set (0.00 sec)

HIGHLIGHT() retrieves all available full-text fields from document storage and highlights them against the provided query. Field syntax in queries is supported. Field text is separated by field_separator, which can be modified in the options.

SQL
SELECT HIGHLIGHT() FROM books WHERE MATCH('@title one');
+-----------------+
| highlight()     |
+-----------------+
| Book <b>one</b> |
+-----------------+
1 row in set (0.00 sec)

Optional first argument in HIGHLIGHT() is the list of options.

SQL
SELECT HIGHLIGHT({before_match='[match]',after_match='[/match]'}) FROM books WHERE MATCH('@title one');
+------------------------------------------------------------+
| highlight({before_match='[match]',after_match='[/match]'}) |
+------------------------------------------------------------+
| Book [match]one[/match]                                    |
+------------------------------------------------------------+
1 row in set (0.00 sec)

The optional second argument is a string containing a single field or a comma-separated list of fields. If this argument is present, only the specified fields will be fetched from document storage and highlighted. An empty string as the second argument signifies "fetch all available fields."

SQL
SELECT HIGHLIGHT({},'title,content') FROM books WHERE MATCH('one|robots');
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| highlight({},'title,content')                                                                                                                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Book <b>one</b> | They followed Bander. The <b>robots</b> remained at a polite distance, but their presence was a constantly felt threat.                                             |
| Bander ushered all three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander gestured the other <b>robots</b> away and entered itself. The door closed behind it. |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Alternatively, you can use the second argument to specify a string attribute or field name without quotes. In this case, the supplied string will be highlighted against the provided query, but field syntax will be ignored.

SQL
SELECT HIGHLIGHT({}, title) FROM books WHERE MATCH('one');
+---------------------+
| highlight({},title) |
+---------------------+
| Book <b>one</b>     |
| Book five           |
+---------------------+
2 rows in set (0.00 sec)

The optional third argument is the query. This is used to highlight search results against a query different from the one used for searching.

SQL
SELECT HIGHLIGHT({},'title', 'five') FROM books WHERE MATCH('one');
+-------------------------------+
| highlight({},'title', 'five') |
+-------------------------------+
| Book one                      |
| Book <b>five</b>              |
+-------------------------------+
2 rows in set (0.00 sec)

Although HIGHLIGHT() is designed to work with stored full-text fields and string attributes, it can also be used to highlight arbitrary text. Keep in mind that if the query contains any field search operators (e.g., @title hello @body world), the field part of them is ignored in this case.

SQL
SELECT HIGHLIGHT({},TO_STRING('some text to highlight'), 'highlight') FROM books WHERE MATCH('@title one');
+----------------------------------------------------------------+
| highlight({},TO_STRING('some text to highlight'), 'highlight') |
+----------------------------------------------------------------+
| some text to <b>highlight</b>                                  |
+----------------------------------------------------------------+
1 row in set (0.00 sec)

Several options are relevant only when generating a single string as a result (not an array of snippets). This applies exclusively to the SQL HIGHLIGHT() function:

snippet_separator

A string to insert between snippets. The default is ....

field_separator

A string to insert between fields. The default is |.

Another way to highlight text is to use the CALL SNIPPETS statement. This mostly duplicates the HIGHLIGHT() functionality but cannot use built-in document storage. However, it can load source text from files.

Highlighting via HTTP

To highlight full-text search results in JSON queries via HTTP, field contents must be stored in document storage (enabled by default). In the example, full-text fields content and title are fetched from document storage and highlighted against the query specified in the query clause.

Highlighted snippets are returned in the highlight property of the hits array.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots" } },
  "highlight":
  {
    "fields": ["content"]
  }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "1",
        "_score": 2788,
        "_source": {
          "title": "Books one",
          "content": "They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "
        },
        "highlight": {
          "content": [
            "They followed Bander. The <b>robots</b> remained at a polite distance, ",
            " three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander",
            " gestured the other <b>robots</b> away and entered itself. The"
          ]
        }
      }
    ]
  }
}
PHP
$index->setName('books');
$results = $index->search('one|robots')->highlight(['content'])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
- They followed Bander. The <b>robots</b> remained at a polite distance,
-  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander
-  gestured the other <b>robots</b> away and entered itself. The
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content"]}}))
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u'They followed Bander. The <b>robots</b> remained at a polite distance, ',
                                                u' three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander',
                                                u' gestured the other <b>robots</b> away and entered itself. The']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content"]}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":["They followed Bander. The <b>robots</b> remained at a polite distance, "," three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander"," gestured the other <b>robots</b> away and entered itself. The"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("fields",new String[] {"content"});
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: null
        hits: [{_id=1, _score=2788, _source={title=Books one, content=They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. }, highlight={title=[Books <b>one</b>], content=[They followed Bander. The <b>robots</b> remained at a polite distance, ,  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander,  gestured the other <b>robots</b> away and entered itself. The]}}]
        aggregations: null
    }
    profile: null
}
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"content"};
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: null
        hits: [{_id=1, _score=2788, _source={title=Books one, content=They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. }, highlight={title=[Books <b>one</b>], content=[They followed Bander. The <b>robots</b> remained at a polite distance, ,  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander,  gestured the other <b>robots</b> away and entered itself. The]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1|Text 9'
    }
  },
  highlight: {}
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1|Text 9"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}

To highlight all possible fields, pass an empty object as the highlight property.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots" } },
  "highlight": {}
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "1",
        "_score": 2788,
        "_source": {
          "title": "Books one",
          "content": "They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "
        },
        "highlight": {
          "title": [
            "Books <b>one</b>"
          ],
          "content": [
            "They followed Bander. The <b>robots</b> remained at a polite distance, ",
            " three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander",
            " gestured the other <b>robots</b> away and entered itself. The"
          ]
        }
      }
    ]
  }
}
PHP
$index->setName('books');
$results = $index->search('one|robots')->highlight()->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for title:
- Books <b>one</b>
Highlight for content:
- They followed Bander. The <b>robots</b> remained at a polite distance,
-  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander
-  gestured the other <b>robots</b> away and entered itself. The
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u'They followed Bander. The <b>robots</b> remained at a polite distance, ',
                                                u' three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander',
                                                u' gestured the other <b>robots</b> away and entered itself. The'],
                                   u'title': [u'Books <b>one</b>']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"title":["Books <b>one</b>"],"content":["They followed Bander. The <b>robots</b> remained at a polite distance, "," three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander"," gestured the other <b>robots</b> away and entered itself. The"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: null
        hits: [{_id=1, _score=2788, _source={title=Books one, content=They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. }, highlight={title=[Books <b>one</b>], content=[They followed Bander. The <b>robots</b> remained at a polite distance, ,  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander,  gestured the other <b>robots</b> away and entered itself. The]}}]
        aggregations: null
    }
    profile: null
}
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: null
        hits: [{_id=1, _score=2788, _source={title=Books one, content=They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. }, highlight={title=[Books <b>one</b>], content=[They followed Bander. The <b>robots</b> remained at a polite distance, ,  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander,  gestured the other <b>robots</b> away and entered itself. The]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1|Doc 1'
    }
  },
  highlight: {}
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ],
                "name":
                [
                    "<b>Doc 1</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1|Doc 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ],
                "name":
                [
                    "<b>Doc 1</b>"
                ]
            }
        ]}
    }
}

In addition to common highlighting options, several synonyms are available for JSON queries via HTTP:

fields

The fields object contains attribute names with options. It can also be an array of field names (without any options).

Note that by default, highlighting attempts to highlight the results following the full-text query. In a general case, when you don't specify fields to highlight, the highlight is based on your full-text query. However, if you specify fields to highlight, it highlights only if the full-text query matches the selected fields.

encoder

The encoder can be set to default or html. When set to html, it retains HTML markup when highlighting. This works similarly to the html_strip_mode=retain option.

highlight_query

The highlight_query option allows you to highlight against a query other than your search query. The syntax is the same as in the main query.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "content": "one|robots" } },
  "highlight":
  {
    "fields": [ "content"],
    "highlight_query": { "match": { "*":"polite distance" } }
   }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'one|robots'], 'content'));

$results = $index->search($bool)->highlight(['content'],['highlight_query'=>['match'=>['*'=>'polite distance']]])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Python
res = searchApi.search({"index":"books","query":{"match":{"content":"one|robots"}},"highlight":{"fields":["content"],"highlight_query":{"match":{"*":"polite distance"}}}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 1788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u'. The robots remained at a <b>polite distance</b>, but their presence was a']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"content":"one|robots"}},"highlight":{"fields":["content"],"highlight_query":{"match":{"*":"polite distance"}}}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":1788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":[". The robots remained at a <b>polite distance</b>, but their presence was a"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
put("fields",new String[] {"content","title"});
put("highlight_query",
    new HashMap<String,Object>(){{
        put("match", new HashMap<String,Object>(){{
            put("*","polite distance");
        }});
    }});
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"content", "title"};
Dictionary<string, Object> match = new Dictionary<string, Object>(); 
match.Add("*", "polite distance");
Dictionary<string, Object> highlightQuery = new Dictionary<string, Object>(); 
highlightQuery.Add("match", match);
highlight.HighlightQuery = highlightQuery;
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: {
    fields: ['content'],
    highlight_query: {
      match: {*: 'Text'}
    }
  }
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text</b> 1"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
highlightField := manticoreclient.NetHighlightField("content")
highlightFields := []interface{} { highlightField } 
highlight.SetFields(highlightFields)
queryMatchClause := map[string]interface{} {"*": "Text"};
highlightQuery := map[string]interface{} {"match": queryMatchClause};
highlight.SetHighlightQuery(highlightQuery)
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text</b> 1"
                ]
            }
        ]}
    }
}

pre_tags and post_tags

pre_tags and post_tags set the opening and closing tags for highlighted text snippets. They function similarly to the before_match and after_match options. These are optional, with default values of <b> and </b>.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "pre_tags": "before_",
    "post_tags": "_after"
   }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'one|robots'], '*'));

$results = $index->search($bool)->highlight(['content','title'],['pre_tags'=>'before_','post_tags'=>'_after'])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
- They followed Bander. The before_robots_after remained at a polite distance,
-  three into the room. before_One_after of the before_robots_after followed as well. Bander
-  gestured the other before_robots_after away and entered itself. The
Highlight for title:
- Books before_one_after
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"pre_tags":"before_","post_tags":"_after"}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u'They followed Bander. The before_robots_after remained at a polite distance, ',
                                                u' three into the room. before_One_after of the before_robots_after followed as well. Bander',
                                                u' gestured the other before_robots_after away and entered itself. The'],
                                   u'title': [u'Books before_one_after']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"pre_tags":"before_","post_tags":"_after"}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":["They followed Bander. The before_robots_after remained at a polite distance, "," three into the room. before_One_after of the before_robots_after followed as well. Bander"," gestured the other before_robots_after away and entered itself. The"],"title":["Books before_one_after"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("fields",new String[] {"content","title"});
    put("pre_tags","before_");
    put("post_tags","_after");
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"content", "title"};
highlight.PreTags = "before_";
highlight.PostTags = "_after";
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: {
    pre_tags: 'before_',
    post_tags: '_after'
  }
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "before_Text 1_after"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"}
query := map[string]interface{} {"match": matchClause}
searchRequest.SetQuery(query)
highlight := manticoreclient.NewHighlight()
highlight.SetPreTags("before_")
highlight.SetPostTags("_after")
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "before_Text 1_after"
                ]
            }
        ]}
    }
}

no_match_size

no_match_size functions similarly to the allow_empty option. If set to 0, it acts as allow_empty=1, allowing an empty string to be returned as a highlighting result when a snippet could not be generated. Otherwise, the beginning of the field will be returned. This is optional, with a default value of 1.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "no_match_size": 0
  }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'one|robots'], '*'));

$results = $index->search($bool)->highlight(['content','title'],['no_match_size'=>0])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
- They followed Bander. The <b>robots</b> remained at a polite distance,
-  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander
-  gestured the other <b>robots</b> away and entered itself. The
Highlight for title:
- Books <b>one</b>
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"no_match_size":0}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u'They followed Bander. The <b>robots</b> remained at a polite distance, ',
                                                u' three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander',
                                                u' gestured the other <b>robots</b> away and entered itself. The'],
                                   u'title': [u'Books <b>one</b>']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"no_match_size":0}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":["They followed Bander. The <b>robots</b> remained at a polite distance, "," three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander"," gestured the other <b>robots</b> away and entered itself. The"],"title":["Books <b>one</b>"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("fields",new String[] {"content","title"});
    put("no_match_size",0);
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"content", "title"};
highlight.NoMatchSize = 0;
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: {no_match_size: 0}
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
highlight.SetNoMatchSize(0)
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}

order

order sets the sorting order of extracted snippets. If set to "score", it sorts the extracted snippets in order of relevance. This is optional and works similarly to the weight_order option.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "order": "score"
  }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'one|robots'], '*'));

$results = $index->search($bool)->highlight(['content','title'],['order'=>"score"])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
-  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander
-  gestured the other <b>robots</b> away and entered itself. The
- They followed Bander. The <b>robots</b> remained at a polite distance,
Highlight for title:
- Books <b>one</b>
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"order":"score"}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u' three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander',
                                                u' gestured the other <b>robots</b> away and entered itself. The',
                                                u'They followed Bander. The <b>robots</b> remained at a polite distance, '],
                                   u'title': [u'Books <b>one</b>']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"order":"score"}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":[" three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander"," gestured the other <b>robots</b> away and entered itself. The","They followed Bander. The <b>robots</b> remained at a polite distance, "],"title":["Books <b>one</b>"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("fields",new String[] {"content","title"});
    put("order","score");
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"content", "title"};
highlight.Order =  "score";
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: { order: 'score' }
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
highlight.SetOrder("score")
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}

fragment_size

fragment_size sets the maximum snippet size in symbols. It can be global or per-field. Per-field options override global options. This is optional, with a default value of 256. It works similarly to the limit option.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "fragment_size": 100
  }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'one|robots'], '*'));

$results = $index->search($bool)->highlight(['content','title'],['fragment_size'=>100])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
-  the room. <b>One</b> of the <b>robots</b> followed as well
- Bander gestured the other <b>robots</b> away and entered
Highlight for title:
- Books <b>one</b>
Python
res = searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"fragment_size":100}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u' the room. <b>One</b> of the <b>robots</b> followed as well',
                                                u'Bander gestured the other <b>robots</b> away and entered '],
                                   u'title': [u'Books <b>one</b>']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"fragment_size":100}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":[" the room. <b>One</b> of the <b>robots</b> followed as well","Bander gestured the other <b>robots</b> away and entered "],"title":["Books <b>one</b>"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("fields",new String[] {"content","title"});
    put("fragment_size",100);
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"content", "title"};
highlight.FragmentSize = 100;
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: { fragment_size: 4}
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
highlight.SetFragmentSize(4)
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text</b>"
                ]
            }
        ]}
    }
}

number_of_fragments

number_of_fragments limits the maximum number of snippets in the result. Like fragment_size, it can be global or per-field. This is optional, with a default value of 0 (no limit). It works similarly to the limit_snippets option.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots"  } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "number_of_fragments": 10
  }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'one|robots'], '*'));

$results = $index->search($bool)->highlight(['content','title'],['number_of_fragments'=>10])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
- They followed Bander. The <b>robots</b> remained at a polite distance,
-  three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander
-  gestured the other <b>robots</b> away and entered itself. The
Highlight for title:
- Books <b>one</b>
Python
res =searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"number_of_fragments":10}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u'They followed Bander. The <b>robots</b> remained at a polite distance, ',
                                                u' three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander',
                                                u' gestured the other <b>robots</b> away and entered itself. The'],
                                   u'title': [u'Books <b>one</b>']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":["content","title"],"number_of_fragments":10}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":["They followed Bander. The <b>robots</b> remained at a polite distance, "," three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander"," gestured the other <b>robots</b> away and entered itself. The"],"title":["Books <b>one</b>"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("fields",new String[] {"content","title"});
    put("number_of_fragments",10);
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.Fieldnames = new List<string> {"content", "title"};
highlight.NumberOfFragments = 10;
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: { number_of_fragments: 1}
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
highlight.SetNumberOfFragments(1)
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text 1</b>"
                ]
            }
        ]}
    }
}

limit, limit_words, limit_snippets

Options like limit, limit_words, and limit_snippets can be set as global or per-field options. Global options are used as per-field limits unless per-field options override them. In the example, the title field is highlighted with default limit settings, while the content field uses a different limit.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "*": "one|robots"  } },
      "highlight":
      {
        "fields":
        {
            "title": {},
            "content" : { "limit": 50 }
        }
      }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'one|robots'], '*'));

$results = $index->search($bool)->highlight(['content'=>['limit'=>50],'title'=>new \stdClass])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
-  into the room. <b>One</b> of the <b>robots</b> followed as well
Highlight for title:
- Books <b>one</b>
Python
res =searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":{"title":{},"content":{"limit":50}}}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 2788,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u' into the room. <b>One</b> of the <b>robots</b> followed as well'],
                                   u'title': [u'Books <b>one</b>']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"*":"one|robots"}},"highlight":{"fields":{"title":{},"content":{"limit":50}}}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":2788,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"title":["Books <b>one</b>"],"content":[" into the room. <b>One</b> of the <b>robots</b> followed as well"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("fields",new HashMap<String,Object>(){{
            put("title",new HashMap<String,Object>(){{}});
            put("content",new HashMap<String,Object>(){{
                put("limit",50);
            }});
        }}
    );
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
var highlightField = new HighlightField("title");
highlightField.Limit = 50;
highlight.Fields = new List<Object> {highlightField};
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: {
    fields: {
      content: { limit:1 }
    }
  }
});
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text</b>"
                ]
            }
        ]}
    }
}
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
highlightField := manticoreclient.NetHighlightField("content")
highlightField.SetLimit(1);
highlightFields := []interface{} { highlightField } 
highlight.SetFields(highlightFields)
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "hits":
    {
        "total":1,
        "hits":
        [{
            "_id":"1",
            "_score":1480,
            "_source":
            {
                "content":"Text 1",
                "name":"Doc 1",
                "cat":1
            },
            "highlight":
            {
                "content":
                [
                    "<b>Text</b>"
                ]
            }
        ]}
    }
}

limits_per_field

Global limits can also be enforced by specifying limits_per_field=0. Setting this option means that all combined highlighting results must be within the specified limits. The downside is that you may get several snippets highlighted in one field and none in another if the highlighting engine decides that they are more relevant.

JSON
POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
      "highlight":
      {
        "limits_per_field": false,
        "fields":
        {
            "content" : { "limit": 50 }
        }
      }
}
PHP
$index->setName('books');
$bool = new \Manticoresearch\Query\BoolQuery();
$bool->must(new \Manticoresearch\Query\Match(['query' => 'and first'], 'content'));

$results = $index->search($bool)->highlight(['content'=>['limit'=>50]],['limits_per_field'=>false])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId()."\n";
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.' : '.$value."\n";
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet)
        {
            echo "- ".$snippet."\n";
        }
    }
}
Document: 1
title : Books one
content : They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it.
Highlight for content:
-  gestured the other robots away <b>and</b> entered itself. The door closed
Python
res =searchApi.search({"index":"books","query":{"match":{"content":"and first"}},"highlight":{"fields":{"content":{"limit":50}},"limits_per_field":False}})
{'aggregations': None,
 'hits': {'hits': [{u'_id': u'1',
                    u'_score': 1597,
                    u'_source': {u'content': u'They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. ',
                                 u'title': u'Books one'},
                    u'highlight': {u'content': [u' gestured the other robots away <b>and</b> entered itself. The door closed']}}],
          'max_score': None,
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res =  await searchApi.search({"index":"books","query":{"match":{"content":"and first"}},"highlight":{"fields":{"content":{"limit":50}},"limits_per_field":false}});
{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":1597,"_source":{"title":"Books one","content":"They followed Bander. The robots remained at a polite distance, but their presence was a constantly felt threat. Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it. "},"highlight":{"content":[" gestured the other robots away <b>and</b> entered itself. The door closed"]}}]}}
Java
searchRequest = new SearchRequest();
searchRequest.setIndex("books");
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","one|robots");
}});        
searchRequest.setQuery(query);
highlight = new HashMap<String,Object>(){{
    put("limits_per_field",0);
    put("fields",new HashMap<String,Object>(){{
            put("content",new HashMap<String,Object>(){{
                put("limit",50);
            }});
        }}
    );
}};
searchRequest.setHighlight(highlight);
searchResponse = searchApi.search(searchRequest);
C#
var searchRequest = new SearchRequest("books");
searchRequest.FulltextFilter = new MatchFilter("*", "one|robots");
var highlight = new Highlight();
highlight.LimitsPerField = 0;
var highlightField = new HighlightField("title");
highlight.Fields = new List<Object> {highlightField};
searchRequest.Highlight = highlight;
var searchResponse = searchApi.Search(searchRequest);
TypeScript
res = await searchApi.search({
  index: 'test',
  query: {
    match: {

        *: 'Text 1'
    }
  },
  highlight: { limits_per_field: 0 }
});
Go
matchClause := map[string]interface{} {"*": "Text 1"};
query := map[string]interface{} {"match": matchClause};
searchRequest.SetQuery(query);
highlight := manticoreclient.NewHighlight()
highlight.SetLimitsPerField(0)
searchRequest.SetHighlight(highlight)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()

CALL SNIPPETS

The CALL SNIPPETS statement builds a snippet from provided data and query using specified table settings. It can't access built-in document storage, which is why it's recommended to use the HIGHLIGHT() function instead.

The syntax is:

CALL SNIPPETS(data, table, query[, opt_value AS opt_name[, ...]])

data

data serves as the source from which a snippet is extracted. It can either be a single string or a list of strings enclosed in curly brackets.

table

table refers to the name of the table that provides the text processing settings for snippet generation.

query

query is the full-text query used to build the snippets.

opt_value and opt_name

opt_value and opt_name represent the snippet generation options.

SQL
CALL SNIPPETS(('this is my document text','this is my another text'), 'forum', 'is text', 5 AS around, 200 AS limit);
+----------------------------------------+
| snippet                                |
+----------------------------------------+
| this <b>is</b> my document <b>text</b> |
| this <b>is</b> my another <b>text</b>  |
+----------------------------------------+
2 rows in set (0.02 sec)

Most options are the same as in the HIGHLIGHT() function. There are, however, several options that can only be used with CALL SNIPPETS.

The following options can be used to highlight text stored in separate files:

load_files

This option, when enabled, treats the first argument as file names instead of data to extract snippets from. The specified files on the server side will be loaded for data. Up to max_threads_per_query worker threads per request will be used to parallelize the work when this flag is enabled. Default is 0 (no limit). To distribute snippet generation between remote agents, invoke snippets generation in a distributed table containing only one(!) local agent and several remotes. The snippets_file_prefix option is used to generate the final file name. For example, when searchd is configured with snippets_file_prefix = /var/data_ and text.txt is provided as a file name, snippets will be generated from the content of /var/data_text.txt.

load_files_scattered

This option only works with distributed snippets generation with remote agents. Source files for snippet generation can be distributed among different agents, and the main server will merge all non-erroneous results. For example, if one agent of the distributed table has file1.txt, another agent has file2.txt, and you use CALL SNIPPETS with both of these files, searchd will merge agent results, so you will get results from both file1.txt and file2.txt. Default is 0.

If the load_files option is also enabled, the request will return an error if any of the files is not available anywhere. Otherwise (if load_files is not enabled), it will return empty strings for all absent files. Searchd does not pass this flag to agents, so agents do not generate a critical error if the file does not exist. If you want to be sure that all source files are loaded, set both load_files_scattered and load_files to 1. If the absence of some source files on some agent is not critical, set only load_files_scattered to 1.

SQL
CALL SNIPPETS(('data/doc1.txt','data/doc2.txt'), 'forum', 'is text', 1 AS load_files);
+----------------------------------------+
| snippet                                |
+----------------------------------------+
| this <b>is</b> my document <b>text</b> |
| this <b>is</b> my another <b>text</b>  |
+----------------------------------------+
2 rows in set (0.02 sec)
¶ 13.8

Sorting and ranking

Query results can be sorted by full-text ranking weight, one or more attributes or expressions.

Full-text queries return matches sorted by default. If nothing is specified, they are sorted by relevance, which is equivalent to ORDER BY weight() DESC in SQL format.

Non-full-text queries do not perform any sorting by default.

Advanced sorting

Extended mode is automatically enabled when you explicitly provide sorting rules by adding the ORDER BY clause in SQL format or using the sort option via HTTP JSON.

Sorting via SQL

General syntax:

SELECT ... ORDER BY
{attribute_name | expr_alias | weight() | random() } [ASC | DESC],
...
{attribute_name | expr_alias | weight() | random() } [ASC | DESC]

In the sort clause, you can use any combination of up to 5 columns, each followed by asc or desc. Functions and expressions are not allowed as arguments for the sort clause, except for the weight() and random() functions (the latter can only be used via SQL in the form of ORDER BY random()). However, you can use any expression in the SELECT list and sort by its alias.

SQL
select *, a + b alias from test order by alias desc;
+------+------+------+----------+-------+
| id   | a    | b    | f        | alias |
+------+------+------+----------+-------+
|    1 |    2 |    3 | document |     5 |
+------+------+------+----------+-------+

Sorting via JSON

"sort" specifies an array where each element can be an attribute name or _score if you want to sort by match weights. In that case, the sort order defaults to ascending for attributes and descending for _score.

JSON
{
  "index":"test",
  "query":
  {
    "match": { "title": "Test document" }
  },
  "sort": [ "_score", "id" ],
  "_source": "title",
  "limit": 3
}
    {
      "took": 0,
      "timed_out": false,
      "hits": {
        "total": 5,
        "total_relation": "eq",
        "hits": [
          {
            "_id": "5406864699109146628",
            "_score": 2319,
            "_source": {
              "title": "Test document 1"
            }
          },
          {
            "_id": "5406864699109146629",
            "_score": 2319,
            "_source": {
              "title": "Test document 2"
            }
          },
          {
            "_id": "5406864699109146630",
            "_score": 2319,
            "_source": {
              "title": "Test document 3"
            }
          }
        ]
      }
    }
PHP
$search->setIndex("test")->match('Test document')->sort('_score')->sort('id');
Python
search_request.index = 'test'
search_request.fulltext_filter = manticoresearch.model.QueryFilter('Test document')
search_request.sort = ['_score', 'id']
javascript
searchRequest.index = "test";
searchRequest.fulltext_filter = new Manticoresearch.QueryFilter('Test document');
searchRequest.sort = ['_score', 'id'];
Java
searchRequest.setIndex("test");
QueryFilter queryFilter = new QueryFilter();
queryFilter.setQueryString("Test document");
searchRequest.setFulltextFilter(queryFilter);
List<Object> sort = new ArrayList<Object>( Arrays.asList("_score", "id") );
searchRequest.setSort(sort);
C#
var searchRequest = new SearchRequest("test");
searchRequest.FulltextFilter = new QueryFilter("Test document");
searchRequest.Sort = new List<Object> {"_score", "id"};
typescript
searchRequest = {
  index: 'test',
  query: {
    query_string: {'Test document'},
  },
  sort: ['_score', 'id'],
}
go
searchRequest.SetIndex("test")
query := map[string]interface{} {"query_string": "Test document"}
searchRequest.SetQuery(query)
sort := map[string]interface{} {"_score": "asc", "id": "asc"}
searchRequest.SetSort(sort)

You can also specify the sort order explicitly:

JSON
{
  "index":"test",
  "query":
  {
    "match": { "title": "Test document" }
  },
  "sort":
  [
    { "id": "desc" },
    "_score"
  ],
  "_source": "title",
  "limit": 3
}
    {
      "took": 0,
      "timed_out": false,
      "hits": {
        "total": 5,
        "total_relation": "eq",
        "hits": [
          {
            "_id": "5406864699109146632",
            "_score": 2319,
            "_source": {
              "title": "Test document 5"
            }
          },
          {
            "_id": "5406864699109146631",
            "_score": 2319,
            "_source": {
              "title": "Test document 4"
            }
          },
          {
            "_id": "5406864699109146630",
            "_score": 2319,
            "_source": {
              "title": "Test document 3"
            }
          }
        ]
      }
    }
PHP
$search->setIndex("test")->match('Test document')->sort('id', 'desc')->sort('_score');
Python
search_request.index = 'test'
search_request.fulltext_filter = manticoresearch.model.QueryFilter('Test document')
sort_by_id = manticoresearch.model.SortOrder('id', 'desc')
search_request.sort = [sort_by_id, '_score']
javascript
searchRequest.index = "test";
searchRequest.fulltext_filter = new Manticoresearch.QueryFilter('Test document');
sortById = new Manticoresearch.SortOrder('id', 'desc');
searchRequest.sort = [sortById, 'id'];
Java
searchRequest.setIndex("test");
QueryFilter queryFilter = new QueryFilter();
queryFilter.setQueryString("Test document");
searchRequest.setFulltextFilter(queryFilter);
List<Object> sort = new ArrayList<Object>();
SortOrder sortById = new SortOrder();
sortById.setAttr("id");
sortById.setOrder(SortOrder.OrderEnum.DESC);
sort.add(sortById);
sort.add("_score");
searchRequest.setSort(sort);
C#
var searchRequest = new SearchRequest("test");
searchRequest.FulltextFilter = new QueryFilter("Test document");
searchRequest.Sort = new List<Object>();
var sortById = new SortOrder("id", SortOrder.OrderEnum.Desc);
searchRequest.Sort.Add(sortById);
searchRequest.Sort.Add("_score");
typescript
searchRequest = {
  index: 'test',
  query: {
    query_string: {'Test document'},
  },
  sort: [{'id': 'desc'}, '_score'],
}
go
searchRequest.SetIndex("test")
query := map[string]interface{} {"query_string": "Test document"}
searchRequest.SetQuery(query)
sortById := map[string]interface{} {"id": "desc"}
sort := map[string]interface{} {"id": "desc", "_score": "asc"}
searchRequest.SetSort(sort)

You can also use another syntax and specify the sort order via the order property:

JSON
{
  "index":"test",
  "query":
  {
    "match": { "title": "Test document" }
  },
  "sort":
  [
    { "id": { "order":"desc" } }
  ],
  "_source": "title",
  "limit": 3
}
    {
      "took": 0,
      "timed_out": false,
      "hits": {
        "total": 5,
        "total_relation": "eq",
        "hits": [
          {
            "_id": "5406864699109146632",
            "_score": 2319,
            "_source": {
              "title": "Test document 5"
            }
          },
          {
            "_id": "5406864699109146631",
            "_score": 2319,
            "_source": {
              "title": "Test document 4"
            }
          },
          {
            "_id": "5406864699109146630",
            "_score": 2319,
            "_source": {
              "title": "Test document 3"
            }
          }
        ]
      }
    }
PHP
$search->setIndex("test")->match('Test document')->sort('id', 'desc');
Python
search_request.index = 'test'
search_request.fulltext_filter = manticoresearch.model.QueryFilter('Test document')
sort_by_id = manticoresearch.model.SortOrder('id', 'desc')
search_request.sort = [sort_by_id]
javascript
searchRequest.index = "test";
searchRequest.fulltext_filter = new Manticoresearch.QueryFilter('Test document');
sortById = new Manticoresearch.SortOrder('id', 'desc');
searchRequest.sort = [sortById];
Java
searchRequest.setIndex("test");
QueryFilter queryFilter = new QueryFilter();
queryFilter.setQueryString("Test document");
searchRequest.setFulltextFilter(queryFilter);
List<Object> sort = new ArrayList<Object>();
SortOrder sortById = new SortOrder();
sortById.setAttr("id");
sortById.setOrder(SortOrder.OrderEnum.DESC);
sort.add(sortById);
searchRequest.setSort(sort);
C#
var searchRequest = new SearchRequest("test");
searchRequest.FulltextFilter = new QueryFilter("Test document");
searchRequest.Sort = new List<Object>();
var sortById = new SortOrder("id", SortOrder.OrderEnum.Desc);
searchRequest.Sort.Add(sortById);
typescript
searchRequest = {
  index: 'test',
  query: {
    query_string: {'Test document'},
  },
  sort: { {'id': {'order':'desc'} },
}
go
searchRequest.SetIndex("test")
query := map[string]interface{} {"query_string": "Test document"}
searchRequest.SetQuery(query)
sort := map[string]interface{} { "id": {"order":"desc"} }
searchRequest.SetSort(sort)

Sorting by MVA attributes is also supported in JSON queries. Sorting mode can be set via the mode property. The following modes are supported:

JSON
{
  "index":"test",
  "query":
  {
    "match": { "title": "Test document" }
  },
  "sort":
  [
    { "attr_mva": { "order":"desc", "mode":"max" } }
  ],
  "_source": "title",
  "limit": 3
}
    {
      "took": 0,
      "timed_out": false,
      "hits": {
        "total": 5,
        "total_relation": "eq",
        "hits": [
          {
            "_id": "5406864699109146631",
            "_score": 2319,
            "_source": {
              "title": "Test document 4"
            }
          },
          {
            "_id": "5406864699109146629",
            "_score": 2319,
            "_source": {
              "title": "Test document 2"
            }
          },
          {
            "_id": "5406864699109146628",
            "_score": 2319,
            "_source": {
              "title": "Test document 1"
            }
          }
        ]
      }
    }
PHP
$search->setIndex("test")->match('Test document')->sort('id','desc','max');
Python
search_request.index = 'test'
search_request.fulltext_filter = manticoresearch.model.QueryFilter('Test document')
sort = manticoresearch.model.SortMVA('attr_mva', 'desc', 'max')
search_request.sort = [sort]
javascript
searchRequest.index = "test";
searchRequest.fulltext_filter = new Manticoresearch.QueryFilter('Test document');
sort = new Manticoresearch.SortMVA('attr_mva', 'desc', 'max');
searchRequest.sort = [sort];
Java
searchRequest.setIndex("test");
QueryFilter queryFilter = new QueryFilter();
queryFilter.setQueryString("Test document");
searchRequest.setFulltextFilter(queryFilter);
SortMVA sort = new SortMVA();
sort.setAttr("attr_mva");
sort.setOrder(SortMVA.OrderEnum.DESC);
sort.setMode(SortMVA.ModeEnum.MAX);
searchRequest.setSort(sort);
C#
var searchRequest = new SearchRequest("test");
searchRequest.FulltextFilter = new QueryFilter("Test document");
var sort = new SortMVA("attr_mva", SortMVA.OrderEnum.Desc, SortMVA.ModeEnum.Max);
searchRequest.Sort.Add(sort);
typescript
searchRequest = {
  index: 'test',
  query: {
    query_string: {'Test document'},
  },
  sort: { "attr_mva": { "order":"desc", "mode":"max" } },
}
go
searchRequest.SetIndex("test")
query := map[string]interface{} {"query_string": "Test document"}
searchRequest.SetQuery(query)
sort := map[string]interface{} { "attr_mva": { "order":"desc", "mode":"max" } }
searchRequest.SetSort(sort)

When sorting on an attribute, match weight (score) calculation is disabled by default (no ranker is used). You can enable weight calculation by setting the track_scores property to true:

JSON
{
  "index":"test",
  "track_scores": true,
  "query":
  {
    "match": { "title": "Test document" }
  },
  "sort":
  [
    { "attr_mva": { "order":"desc", "mode":"max" } }
  ],
  "_source": "title",
  "limit": 3
}
    {
      "took": 0,
      "timed_out": false,
      "hits": {
        "total": 5,
        "total_relation": "eq",
        "hits": [
          {
            "_id": "5406864699109146631",
            "_score": 2319,
            "_source": {
              "title": "Test document 4"
            }
          },
          {
            "_id": "5406864699109146629",
            "_score": 2319,
            "_source": {
              "title": "Test document 2"
            }
          },
          {
            "_id": "5406864699109146628",
            "_score": 2319,
            "_source": {
              "title": "Test document 1"
            }
          }
        ]
      }
    }
PHP
$search->setIndex("test")->match('Test document')->sort('id','desc','max')->trackScores(true);
Python
search_request.index = 'test'
search_request.track_scores = true
search_request.fulltext_filter = manticoresearch.model.QueryFilter('Test document')
sort = manticoresearch.model.SortMVA('attr_mva', 'desc', 'max')
search_request.sort = [sort]
javascript
searchRequest.index = "test";
searchRequest.trackScores = true;
searchRequest.fulltext_filter = new Manticoresearch.QueryFilter('Test document');
sort = new Manticoresearch.SortMVA('attr_mva', 'desc', 'max');
searchRequest.sort = [sort];
Java
searchRequest.setIndex("test");
searchRequest.setTrackScores(true);
QueryFilter queryFilter = new QueryFilter();
queryFilter.setQueryString("Test document");
searchRequest.setFulltextFilter(queryFilter);
SortMVA sort = new SortMVA();
sort.setAttr("attr_mva");
sort.setOrder(SortMVA.OrderEnum.DESC);
sort.setMode(SortMVA.ModeEnum.MAX);
searchRequest.setSort(sort);
C#
var searchRequest = new SearchRequest("test");
searchRequest.SetTrackScores(true);
searchRequest.FulltextFilter = new QueryFilter("Test document");
var sort = new SortMVA("attr_mva", SortMVA.OrderEnum.Desc, SortMVA.ModeEnum.Max);
searchRequest.Sort.Add(sort);
typescript
searchRequest = {
  index: 'test',
  track_scores: true,
  query: {
    query_string: {'Test document'},
  },
  sort: { "attr_mva": { "order":"desc", "mode":"max" } },
}
go
searchRequest.SetIndex("test")
searchRequest.SetTrackScores(true)
query := map[string]interface{} {"query_string": "Test document"}
searchRequest.SetQuery(query)
sort := map[string]interface{} { "attr_mva": { "order":"desc", "mode":"max" } }
searchRequest.SetSort(sort)

Ranking overview

Ranking (also known as weighting) of search results can be defined as a process of computing a so-called relevance (weight) for every given matched document regards to a given query that matched it. So relevance is, in the end, just a number attached to every document that estimates how relevant the document is to the query. Search results can then be sorted based on this number and/or some additional parameters, so that the most sought-after results would appear higher on the results page.

There is no single standard one-size-fits-all way to rank any document in any scenario. Moreover, there can never be such a way, because relevance is subjective. As in, what seems relevant to you might not seem relevant to me. Hence, in general cases, it's not just hard to compute; it's theoretically impossible.

So ranking in Manticore is configurable. It has a notion of a so-called ranker. A ranker can formally be defined as a function that takes a document and a query as its input and produces a relevance value as output. In layman's terms, a ranker controls exactly how (using which specific algorithm) Manticore will assign weights to the documents.

Available built-in rankers

Manticore ships with several built-in rankers suited for different purposes. Many of them use two factors: phrase proximity (also known as LCS) and BM25. Phrase proximity works on keyword positions, while BM25 works on keyword frequencies. Essentially, the better the degree of phrase match between the document body and the query, the higher the phrase proximity (it maxes out when the document contains the entire query as a verbatim quote). And BM25 is higher when the document contains more rare words. We'll save the detailed discussion for later.

The currently implemented rankers are:

The ranker name is case-insensitive. Example:

SELECT ... OPTION ranker=sph04;

Quick summary of the ranking factors

Name Level Type Summary
max_lcs query int maximum possible LCS value for the current query
bm25 document int quick estimate of BM25(1.2, 0)
bm25a(k1, b) document int precise BM25() value with configurable K1, B constants and syntax support
bm25f(k1, b, {field=weight, ...}) document int precise BM25F() value with extra configurable field weights
field_mask document int bit mask of matched fields
query_word_count document int number of unique inclusive keywords in a query
doc_word_count document int number of unique keywords matched in the document
lcs field int Longest Common Subsequence between query and document, in words
user_weight field int user field weight
hit_count field int total number of keyword occurrences
word_count field int number of unique matched keywords
tf_idf field float sum(tf*idf) over matched keywords == sum(idf) over occurrences
min_hit_pos field int first matched occurrence position, in words, 1-based
min_best_span_pos field int first maximum LCS span position, in words, 1-based
exact_hit field bool whether query == field
min_idf field float min(idf) over matched keywords
max_idf field float max(idf) over matched keywords
sum_idf field float sum(idf) over matched keywords
exact_order field bool whether all query keywords were a) matched and b) in query order
min_gaps field int minimum number of gaps between the matched keywords over the matching spans
lccs field int Longest Common Contiguous Subsequence between query and document, in words
wlccs field float Weighted Longest Common Contiguous Subsequence, sum(idf) over contiguous keyword spans
atc field float Aggregate Term Closeness, log(1+sum(idf1idf2pow(distance, -1.75)) over the best pairs of keywords

Document-level Ranking Factors

A document-level factor is a numeric value computed by the ranking engine for every matched document with regards to the current query. So it differs from a plain document attribute in that the attribute does not depend on the full text query, while factors might. These factors can be used anywhere in the ranking expression. Currently implemented document-level factors are:

Field-level Ranking Factors

A field-level factor is a numeric value computed by the ranking engine for every matched in-document text field regards to the current query. As more than one field can be matched by a query, but the final weight needs to be a single integer value, these values need to be folded into a single one. To achieve that, field-level factors can only be used within a field aggregation function, they can not be used anywhere in the expression. For example, you cannot use (lcs+bm25) as your ranking expression, as lcs takes multiple values (one in every matched field). You should use (sum(lcs)+bm25) instead, that expression sums lcs over all matching fields, and then adds bm25 to that per-field sum. Currently implemented field-level factors are:

Therefore, this is a relatively low-level, "raw" factor that you'll likely want to adjust before using it for ranking. The specific adjustments depend heavily on your data and the resulting formula, but here are a few ideas to start with: (a) any min_gaps-based boosts could be simply ignored when word_count<2;

(b) non-trivial min_gaps values (i.e., when word_count>=2) could be clamped with a certain "worst-case" constant, while trivial values (i.e., when min_gaps=0 and word_count<2) could be replaced by that constant;

(c) a transfer function like 1/(1+min_gaps) could be applied (so that better, smaller min_gaps values would maximize it, and worse, larger min_gaps values would fall off slowly); and so on.

In other words, we process the best (closest) matched keyword pairs in the document, and compute pairwise "closenesses" as the product of their IDFs scaled by the distance coefficient:

pair_tc = idf(pair_word1) * idf(pair_word2) * pow(pair_distance, -1.75)

We then sum such closenesses, and compute the final, log-dampened ATC value:

atc = log(1+sum(pair_tc))

Note that this final dampening logarithm is precisely the reason you should use OPTION idf=plain because, without it, the expression inside the log() could be negative.

Having closer keyword occurrences contributes much more to ATC than having more frequent keywords. Indeed, when the keywords are right next to each other, distance=1 and k=1; when there's just one word in between them, distance=2 and k=0.297, with two words between, distance=3 and k=0.146, and so on. At the same time, IDF attenuates somewhat slower. For example, in a 1 million document collection, the IDF values for keywords that match in 10, 100, and 1000 documents would be respectively 0.833, 0.667, and 0.500. So a keyword pair with two rather rare keywords that occur in just 10 documents each but with 2 other words in between would yield pair_tc = 0.101 and thus barely outweigh a pair with a 100-doc and a 1000-doc keyword with 1 other word between them and pair_tc = 0.099. Moreover, a pair of two unique, 1-doc keywords with 3 words between them would get a pair_tc = 0.088 and lose to a pair of two 1000-doc keywords located right next to each other and yielding a pair_tc = 0.25. So, basically, while ATC does combine both keyword frequency and proximity, it still somewhat favors proximity.

Ranking factor aggregation functions

A field aggregation function is a single-argument function that accepts an expression with field-level factors, iterates over all matched fields, and computes the final results. The currently implemented field aggregation functions include:

Formula expressions for all the built-in rankers

Most other rankers can actually be emulated using the expression-based ranker. You just need to provide an appropriate expression. While this emulation will likely be slower than using the built-in, compiled ranker, it may still be interesting if you want to fine-tune your ranking formula starting with one of the existing ones. Additionally, the formulas describe the ranker details in a clear, readable manner.

Configuration of IDF formula

The historically default IDF (Inverse Document Frequency) in Manticore is equivalent to OPTION idf='normalized,tfidf_normalized', and those normalizations may cause several undesired effects.

First, idf=normalized causes keyword penalization. For instance, if you search for the | something and the occurs in more than 50% of the documents, then documents with both keywords the and[something will get less weight than documents with just one keyword something. Using OPTION idf=plain avoids this.

Plain IDF varies in [0, log(N)] range, and keywords are never penalized; while the normalized IDF varies in [-log(N), log(N)] range, and too frequent keywords are penalized.

Second, idf=tfidf_normalized causes IDF drift over queries. Historically, we additionally divided IDF by query keyword count, so that the entire sum(tf*idf) over all keywords would still fit into [0,1] range. However, that means that queries word1 and word1 | nonmatchingword2 would assign different weights to the exactly same result set, because the IDFs for both word1 and nonmatchingword2 would be divided by 2. OPTION idf='tfidf_unnormalized' fixes that. Note that BM25, BM25A, BM25F() ranking factors will be scale accordingly once you disable this normalization.

IDF flags can be mixed; plain and normalized are mutually exclusive;tfidf_unnormalized and tfidf_normalized are mutually exclusive; and unspecified flags in such a mutually exclusive group take their defaults. That means that OPTION idf=plain is equivalent to a complete OPTION idf='plain,tfidf_normalized' specification.

¶ 13.9

Pagination of search results

Manticore Search returns the top 20 matched documents in the result set by default.

SQL

In SQL, you can navigate through the result set using the LIMIT clause.

LIMIT can accept either one number as the size of the returned set with a zero offset, or a pair of offset and size values.

HTTP JSON

When using HTTP JSON, the nodes offset and limit control the offset of the result set and the size of the returned set. Alternatively, you can use the pair size and from instead.

SQL
SELECT  ... FROM ...  [LIMIT [offset,] row_count]
SELECT  ... FROM ...  [LIMIT row_count][ OFFSET offset]
JSON
{
  "index": "<index_name>",
  "query": ...
  ...  
  "limit": 20,
  "offset": 0
}
{
  "index": "<index_name>",
  "query": ...
  ...  
  "size": 20,
  "from": 0
}

Result set window

By default, Manticore Search uses a result set window of 1000 best-ranked documents that can be returned in the result set. If the result set is paginated beyond this value, the query will end in error.

This limitation can be adjusted with the query option max_matches.

Increasing the max_matches to very high values should only be done if it's necessary for the navigation to reach such points. A high max_matches value requires more memory and can increase the query response time. One way to work with deep result sets is to set max_matches as the sum of the offset and limit.

Lowering max_matches below 1000 has the benefit of reducing the memory used by the query. It can also reduce the query time, but in most cases, it might not be a noticeable gain.

SQL
SELECT  ... FROM ...   OPTION max_matches=<value>
JSON
{
  "index": "<index_name>",
  "query": ...
  ...
  "max_matches":<value>
  }
}
¶ 13.10

Distributed searching

Manticore is designed to scale effectively through its distributed searching capabilities. Distributed searching is beneficial for improving query latency (i.e., search time) and throughput (i.e., max queries/sec) in multi-server, multi-CPU, or multi-core environments. This is crucial for applications that need to search through vast amounts of data (i.e., billions of records and terabytes of text).

The primary concept is to horizontally partition the searched data across search nodes and process it in parallel.

Partitioning is done manually. To set it up, you should:

This type of table only contains references to other local and remote tables - so it cannot be directly reindexed. Instead, you should reindex the tables that it references.

When Manticore receives a query against a distributed table, it performs the following steps:

  1. Connects to the configured remote agents
  2. Sends the query to them
  3. Simultaneously searches the configured local tables (while the remote agents are searching)
  4. Retrieves the search results from the remote agents
  5. Merges all the results together, removing duplicates
  6. Sends the merged results to the client

From the application's perspective, there are no differences between searching through a regular table or a distributed table. In other words, distributed tables are fully transparent to the application, and there's no way to tell whether the table you queried was distributed or local.

Learn more about remote nodes.

¶ 13.11

Multi-queries

Multi-queries, or query batches, allow you to send multiple search queries to Manticore in a single network request.

👍 Why use multi-queries?

The primary reason is performance. By sending requests to Manticore in a batch instead of one by one, you save time by reducing network round-trips. Additionally, sending queries in a batch allows Manticore to perform certain internal optimizations. If no batch optimizations can be applied, queries will be processed individually.

⛔ When not to use multi-queries?

Multi-queries require all search queries in a batch to be independent, which isn't always the case. Sometimes query B depends on query A's results, meaning query B can only be set up after executing query A. For example, you might want to display results from a secondary index only if no results were found in the primary table, or you may want to specify an offset into the 2nd result set based on the number of matches in the 1st result set. In these cases, you'll need to use separate queries (or separate batches).

You can run multiple search queries with SQL by separating them with a semicolon. When Manticore receives a query formatted like this from a client, all inter-statement optimizations will be applied.

Multi-queries don't support queries with FACET. The number of multi-queries in one batch shouldn't exceed max_batch_queries.

SQL
SELECT id, price FROM products WHERE MATCH('remove hair') ORDER BY price DESC; SELECT id, price FROM products WHERE MATCH('remove hair') ORDER BY price ASC

Multi-queries optimizations

There are two major optimizations to be aware of: common query optimization and common subtree optimization.

Common query optimization means that searchd will identify all those queries in a batch where only the sorting and group-by settings differ, and only perform searching once. For example, if a batch consists of 3 queries, all of them are for "ipod nano", but the 1st query requests the top-10 results sorted by price, the 2nd query groups by vendor ID and requests the top-5 vendors sorted by rating, and the 3rd query requests the max price, full-text search for "ipod nano" will only be performed once, and its results will be reused to build 3 different result sets.

Faceted search is a particularly important case that benefits from this optimization. Indeed, faceted searching can be implemented by running several queries, one to retrieve search results themselves, and a few others with the same full-text query but different group-by settings to retrieve all the required groups of results (top-3 authors, top-5 vendors, etc). As long as the full-text query and filtering settings stay the same, common query optimization will trigger, and greatly improve performance.

Common subtree optimization is even more interesting. It allows searchd to exploit similarities between batched full-text queries. It identifies common full-text query parts (subtrees) in all queries and caches them between queries. For example, consider the following query batch:

donald trump president
donald trump barack obama john mccain
donald trump speech

There's a common two-word part donald trump that can be computed only once, then cached and shared across the queries. And common subtree optimization does just that. Per-query cache size is strictly controlled by subtree_docs_cache and subtree_hits_cache directives (so that caching all sixteen gazillions of documents that match "i am" does not exhaust the RAM and instantly kill your server).

How can you tell if the queries in the batch were actually optimized? If they were, the respective query log will have a "multiplier" field that specifies how many queries were processed together:

Note the "x3" field. It means that this query was optimized and processed in a sub-batch of 3 queries.

log
[Sun Jul 12 15:18:17.000 2009] 0.040 sec x3 [ext/0/rel 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.000 2009] 0.040 sec x3 [ext/0/ext 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.000 2009] 0.040 sec x3 [ext/0/ext 747541 (0,20)] [lj] the

For reference, this is how the regular log would look like if the queries were not batched:

log
[Sun Jul 12 15:18:17.062 2009] 0.059 sec [ext/0/rel 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.156 2009] 0.091 sec [ext/0/ext 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.250 2009] 0.092 sec [ext/0/ext 747541 (0,20)] [lj] the

Notice how the per-query time in the multi-query case improved by a factor of 1.5x to 2.3x, depending on the specific sorting mode.

¶ 13.12

Sub-selects

Manticore supports SELECT subqueries via SQL in the following format:

SELECT * FROM (SELECT ... ORDER BY cond1 LIMIT X) ORDER BY cond2 LIMIT Y

The outer select allows only ORDER BY and LIMIT clauses. Sub-select queries currently have two use cases:

  1. When you have a query with two ranking UDFs, one very fast and the other slow, and perform a full-text search with a large match result set. Without subselect, the query would look like:
SELECT id,slow_rank() as slow,fast_rank() as fast FROM index
    WHERE MATCH(some common query terms) ORDER BY fast DESC, slow DESC LIMIT 20
    OPTION max_matches=1000;

With sub-selects, the query can be rewritten as:

SELECT * FROM
    (SELECT id,slow_rank() as slow,fast_rank() as fast FROM index WHERE
        MATCH(some common query terms)
        ORDER BY fast DESC LIMIT 100 OPTION max_matches=1000)
ORDER BY slow DESC LIMIT 20;

In the initial query, the slow_rank() UDF is computed for the entire match result set. With SELECT sub-queries, only fast_rank() is computed for the entire match result set, while slow_rank() is computed for a limited set.

  1. The second case is useful for large result sets coming from a distributed table.

For this query:

SELECT * FROM my_dist_index WHERE some_conditions LIMIT 50000;

If you have 20 nodes, each node can send back to the master a maximum of 50K records, resulting in 20 x 50K = 1M records. However, since the master sends back only 50K (out of 1M), it might be good enough for the nodes to send only the top 10K records. With sub-select, you can rewrite the query as:

SELECT * FROM
        (SELECT * FROM my_dist_index WHERE some_conditions LIMIT 10000)
    ORDER by some_attr LIMIT 50000;

In this case, the nodes receive only the inner query and execute it. This means the master will receive only 20x10K=200K records. The master will take all the records received, reorder them by the OUTER clause, and return the best 50K records. The sub-select helps reduce the traffic between the master and the nodes, as well as reduce the master's computation time (since it processes only 200K instead of 1M records).

¶ 13.13

Grouping search results

Grouping search results is often helpful for obtaining per-group match counts or other aggregations. For example, it's useful for creating a graph illustrating the number of matching blog posts per month or grouping web search results by site or forum posts by author, etc.

Manticore supports the grouping of search results by single or multiple columns and computed expressions. The results can:

The general syntax is:

SQL
General syntax
SELECT {* | SELECT_expr [, SELECT_expr ...]}
...
GROUP BY {field_name | alias } [, ...]
[HAVING where_condition]
[WITHIN GROUP ORDER BY field_name {ASC | DESC} [, ...]]
...

SELECT_expr: { field_name | function_name(...) }
where_condition: {aggregation expression alias | COUNT(*)}
JSON
JSON query format currently supports a basic grouping that can retrieve aggregate values and their count(*).
{
  "index": "<index_name>",
  "limit": 0,
  "aggs": {
    "<aggr_name>": {
      "terms": {
        "field": "<attribute>",
        "size": <int value>
      }
    }
  }
}
The standard query output returns the result set without grouping, which can be hidden using `limit` (or `size`). The aggregation requires setting a `size` for the group's result set size.

Just Grouping

Grouping is quite simple - just add "GROUP BY smth" to the end of your SELECT query. The something can be:

You can omit any aggregation functions in the SELECT list and it will still work:

SQL
SELECT release_year FROM films GROUP BY release_year LIMIT 5;
+--------------+
| release_year |
+--------------+
|         2004 |
|         2002 |
|         2001 |
|         2005 |
|         2000 |
+--------------+

In most cases, however, you'll want to obtain some aggregated data for each group, such as:

SQL1
SELECT release_year, count(*) FROM films GROUP BY release_year LIMIT 5;
+--------------+----------+
| release_year | count(*) |
+--------------+----------+
|         2004 |      108 |
|         2002 |      108 |
|         2001 |       91 |
|         2005 |       93 |
|         2000 |       97 |
+--------------+----------+
SQL2
SELECT release_year, AVG(rental_rate) FROM films GROUP BY release_year LIMIT 5;
+--------------+------------------+
| release_year | avg(rental_rate) |
+--------------+------------------+
|         2004 |       2.78629661 |
|         2002 |       3.08259249 |
|         2001 |       3.09989142 |
|         2005 |       2.90397978 |
|         2000 |       3.17556739 |
+--------------+------------------+
JSON
POST /search -d '
    {
     "index" : "films",
     "limit": 0,
     "aggs" :
     {
        "release_year" :
         {
            "terms" :
             {
              "field":"release_year",
              "size":100
             }
         }
     }
    }
'
{
  "took": 2,
  "timed_out": false,
  "hits": {
    "total": 10000,
    "hits": [

    ]
  },
  "release_year": {
    "group_brand_id": {
      "buckets": [
        {
          "key": 2004,
          "doc_count": 108
        },
        {
          "key": 2002,
          "doc_count": 108
        },
        {
          "key": 2000,
          "doc_count": 97
        },
        {
          "key": 2005,
          "doc_count": 93
        },
        {
          "key": 2001,
          "doc_count": 91
        }
      ]
    }
  }
}
PHP
$index->setName('films');
$search = $index->search('');
$search->limit(0);
$search->facet('release_year','release_year',100);
$results = $search->get();
print_r($results->getFacets());
Array
(
    [release_year] => Array
        (
            [buckets] => Array
                (
                    [0] => Array
                        (
                            [key] => 2009
                            [doc_count] => 99
                        )
                    [1] => Array
                        (
                            [key] => 2008
                            [doc_count] => 102
                        )
                    [2] => Array
                        (
                            [key] => 2007
                            [doc_count] => 93
                        )
                    [3] => Array
                        (
                            [key] => 2006
                            [doc_count] => 103
                        )
                    [4] => Array
                        (
                            [key] => 2005
                            [doc_count] => 93
                        )
                    [5] => Array
                        (
                            [key] => 2004
                            [doc_count] => 108
                        )
                    [6] => Array
                        (
                            [key] => 2003
                            [doc_count] => 106
                        )
                    [7] => Array
                        (
                            [key] => 2002
                            [doc_count] => 108
                        )
                    [8] => Array
                        (
                            [key] => 2001
                            [doc_count] => 91
                        )
                    [9] => Array
                        (
                            [key] => 2000
                            [doc_count] => 97
                        )
                )
        )
)
Python
res =searchApi.search({"index":"films","limit":0,"aggs":{"release_year":{"terms":{"field":"release_year","size":100}}}})
{'aggregations': {u'release_year': {u'buckets': [{u'doc_count': 99,
                                                  u'key': 2009},
                                                 {u'doc_count': 102,
                                                  u'key': 2008},
                                                 {u'doc_count': 93,
                                                  u'key': 2007},
                                                 {u'doc_count': 103,
                                                  u'key': 2006},
                                                 {u'doc_count': 93,
                                                  u'key': 2005},
                                                 {u'doc_count': 108,
                                                  u'key': 2004},
                                                 {u'doc_count': 106,
                                                  u'key': 2003},
                                                 {u'doc_count': 108,
                                                  u'key': 2002},
                                                 {u'doc_count': 91,
                                                  u'key': 2001},
                                                 {u'doc_count': 97,
                                                  u'key': 2000}]}},
 'hits': {'hits': [], 'max_score': None, 'total': 1000},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res = await searchApi.search({"index":"films","limit":0,"aggs":{"release_year":{"terms":{"field":"release_year","size":100}}}});
{"took":0,"timed_out":false,"aggregations":{"release_year":{"buckets":[{"key":2009,"doc_count":99},{"key":2008,"doc_count":102},{"key":2007,"doc_count":93},{"key":2006,"doc_count":103},{"key":2005,"doc_count":93},{"key":2004,"doc_count":108},{"key":2003,"doc_count":106},{"key":2002,"doc_count":108},{"key":2001,"doc_count":91},{"key":2000,"doc_count":97}]}},"hits":{"total":1000,"hits":[]}}
Java
HashMap<String,Object> aggs = new HashMap<String,Object>(){{
    put("release_year", new HashMap<String,Object>(){{
        put("terms", new HashMap<String,Object>(){{
            put("field","release_year");
            put("size",100);
        }});
    }});
}};

searchRequest = new SearchRequest();
searchRequest.setIndex("films");        
searchRequest.setLimit(0);
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setAggs(aggs);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    aggregations: {release_year={buckets=[{key=2009, doc_count=99}, {key=2008, doc_count=102}, {key=2007, doc_count=93}, {key=2006, doc_count=103}, {key=2005, doc_count=93}, {key=2004, doc_count=108}, {key=2003, doc_count=106}, {key=2002, doc_count=108}, {key=2001, doc_count=91}, {key=2000, doc_count=97}]}}
    hits: class SearchResponseHits {
        maxScore: null
        total: 1000
        hits: []
    }
    profile: null
}
C#
var agg = new Aggregation("release_year", "release_year");
agg.Size = 100;
object query = new { match_all=null };
var searchRequest = new SearchRequest("films", query);
searchRequest.Aggs = new List<Aggregation> {agg};
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    aggregations: {release_year={buckets=[{key=2009, doc_count=99}, {key=2008, doc_count=102}, {key=2007, doc_count=93}, {key=2006, doc_count=103}, {key=2005, doc_count=93}, {key=2004, doc_count=108}, {key=2003, doc_count=106}, {key=2002, doc_count=108}, {key=2001, doc_count=91}, {key=2000, doc_count=97}]}}
    hits: class SearchResponseHits {
        maxScore: null
        total: 1000
        hits: []
    }
    profile: null
}
TypeScript
res = await searchApi.search({
  index: 'test',
  limit: 0,
  aggs: {
    cat_id: {
      terms: { field: "cat", size: 1 }
    }
  }
});
{
    "took":0,
    "timed_out":false,
    "aggregations":
    {
        "cat_id":
        {
            "buckets":
            [{
                "key":1,
                "doc_count":1
            }]
        }
    },
    "hits":
    {
        "total":5,
        "hits":[]
    }
}
Go
query := map[string]interface{} {};
searchRequest.SetQuery(query);
aggTerms := manticoreclient.NewAggregationTerms()
aggTerms.SetField("cat")
aggTerms.SetSize(1)
aggregation := manticoreclient.NewAggregation()
aggregation.setTerms(aggTerms)
searchRequest.SetAggregation(aggregation)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "aggregations":
    {
        "cat_id":
        {
            "buckets":
            [{
                "key":1,
                "doc_count":1
            }]
        }
    },
    "hits":
    {
        "total":5,
        "hits":[]
    }
}

By default, groups are not sorted, and the next thing you typically want to do is order them by something, like the field you're grouping by:

SQL
SELECT release_year, count(*) from films GROUP BY release_year ORDER BY release_year asc limit 5;
+--------------+----------+
| release_year | count(*) |
+--------------+----------+
|         2000 |       97 |
|         2001 |       91 |
|         2002 |      108 |
|         2003 |      106 |
|         2004 |      108 |
+--------------+----------+

Alternatively, you can sort by the aggregation:

SQL1
SELECT release_year, count(*) FROM films GROUP BY release_year ORDER BY count(*) desc LIMIT 5;
+--------------+----------+
| release_year | count(*) |
+--------------+----------+
|         2004 |      108 |
|         2002 |      108 |
|         2003 |      106 |
|         2006 |      103 |
|         2008 |      102 |
+--------------+----------+
SQL2
SELECT release_year, AVG(rental_rate) avg FROM films GROUP BY release_year ORDER BY avg desc LIMIT 5;
+--------------+------------+
| release_year | avg        |
+--------------+------------+
|         2006 | 3.26184368 |
|         2000 | 3.17556739 |
|         2001 | 3.09989142 |
|         2002 | 3.08259249 |
|         2008 | 2.99000049 |
+--------------+------------+

In some cases, you might want to group not just by a single field, but by multiple fields at once, such as a movie's category and year:

SQL
SELECT category_id, release_year, count(*) FROM films GROUP BY category_id, release_year ORDER BY category_id ASC, release_year ASC;
+-------------+--------------+----------+
| category_id | release_year | count(*) |
+-------------+--------------+----------+
|           1 |         2000 |        5 |
|           1 |         2001 |        2 |
|           1 |         2002 |        6 |
|           1 |         2003 |        6 |
|           1 |         2004 |        5 |
|           1 |         2005 |       10 |
|           1 |         2006 |        4 |
|           1 |         2007 |        5 |
|           1 |         2008 |        7 |
|           1 |         2009 |       14 |
|           2 |         2000 |       10 |
|           2 |         2001 |        5 |
|           2 |         2002 |        6 |
|           2 |         2003 |        6 |
|           2 |         2004 |       10 |
|           2 |         2005 |        4 |
|           2 |         2006 |        5 |
|           2 |         2007 |        8 |
|           2 |         2008 |        8 |
|           2 |         2009 |        4 |
+-------------+--------------+----------+
JSON
POST /search -d '
    {
    "size": 0,
    "index": "films", 
    "aggs": {
        "cat_release": {
            "composite": {
                "size":5,
                "sources": [
                    { "category": { "terms": { "field": "category_id" } } },
                    { "release year": { "terms": { "field": "release_year" } } }
                ]
            }
        }
    }
    }
'
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1000,
    "total_relation": "eq",
    "hits": []
  },
  "aggregations": {
    "cat_release": {
      "after_key": {
        "category": 1,
        "release year": 2007
      },
      "buckets": [
        {
          "key": {
            "category": 1,
            "release year": 2008
          },
          "doc_count": 7
        },
        {
          "key": {
            "category": 1,
            "release year": 2009
          },
          "doc_count": 14
        },
        {
          "key": {
            "category": 1,
            "release year": 2005
          },
          "doc_count": 10
        },
        {
          "key": {
            "category": 1,
            "release year": 2004
          },
          "doc_count": 5
        },
        {
          "key": {
            "category": 1,
            "release year": 2007
          },
          "doc_count": 5
        }
      ]
    }
  }
}

Sometimes it's useful to see not just a single element per group, but multiple. This can be easily achieved with the help of GROUP N BY. For example, in the following case, we get two movies for each year rather than just one, which a simple GROUP BY release_year would have returned.

SQL
SELECT release_year, title FROM films GROUP 2 BY release_year ORDER BY release_year DESC LIMIT 6;
+--------------+-----------------------------+
| release_year | title                       |
+--------------+-----------------------------+
|         2009 | ALICE FANTASIA              |
|         2009 | ALIEN CENTER                |
|         2008 | AMADEUS HOLY                |
|         2008 | ANACONDA CONFESSIONS        |
|         2007 | ANGELS LIFE                 |
|         2007 | ARACHNOPHOBIA ROLLERCOASTER |
+--------------+-----------------------------+

Another crucial analytics requirement is to sort elements within a group. To achieve this, use the WITHIN GROUP ORDER BY ... {ASC|DESC} clause. For example, let's get the highest-rated film for each year. Note that it works in parallel with just ORDER BY:

These two work entirely independently.

SQL
SELECT release_year, title, rental_rate FROM films GROUP BY release_year WITHIN GROUP ORDER BY rental_rate DESC ORDER BY release_year DESC LIMIT 5;
+--------------+------------------+-------------+
| release_year | title            | rental_rate |
+--------------+------------------+-------------+
|         2009 | AMERICAN CIRCUS  |    4.990000 |
|         2008 | ANTHEM LUKE      |    4.990000 |
|         2007 | ATTACKS HATE     |    4.990000 |
|         2006 | ALADDIN CALENDAR |    4.990000 |
|         2005 | AIRPLANE SIERRA  |    4.990000 |
+--------------+------------------+-------------+

HAVING expression is a helpful clause for filtering groups. While WHERE is applied before grouping, HAVING works with the groups. For example, let's keep only those years when the average rental rate of the films for that year was higher than 3. We get only four years:

SQL
SELECT release_year, avg(rental_rate) avg FROM films GROUP BY release_year HAVING avg > 3;
+--------------+------------+
| release_year | avg        |
+--------------+------------+
|         2002 | 3.08259249 |
|         2001 | 3.09989142 |
|         2000 | 3.17556739 |
|         2006 | 3.26184368 |
+--------------+------------+

There is a function GROUPBY() which returns the key of the current group. It's useful in many cases, especially when you GROUP BY an MVA or a JSON value.

It can also be used in HAVING, for example, to keep only years 2000 and 2002.

Note that GROUPBY()is not recommended for use when you GROUP BY multiple fields at once. It will still work, but since the group key in this case is a compound of field values, it may not appear the way you expect.

SQL
SELECT release_year, count(*) FROM films GROUP BY release_year HAVING GROUPBY() IN (2000, 2002);
+--------------+----------+
| release_year | count(*) |
+--------------+----------+
|         2002 |      108 |
|         2000 |       97 |
+--------------+----------+

Manticore supports grouping by MVA. To demonstrate how it works, let's create a table "shoes" with MVA "sizes" and insert a few documents into it:

create table shoes(title text, sizes multi);
insert into shoes values(0,'nike',(40,41,42)),(0,'adidas',(41,43)),(0,'reebook',(42,43));

so we have:

SELECT * FROM shoes;
+---------------------+----------+---------+
| id                  | sizes    | title   |
+---------------------+----------+---------+
| 1657851069130080265 | 40,41,42 | nike    |
| 1657851069130080266 | 41,43    | adidas  |
| 1657851069130080267 | 42,43    | reebook |
+---------------------+----------+---------+

If we now GROUP BY "sizes", it will process all our multi-value attributes and return an aggregation for each, in this case just the count:

SQL
SELECT groupby() gb, count(*) FROM shoes GROUP BY sizes ORDER BY gb asc;
+------+----------+
| gb   | count(*) |
+------+----------+
|   40 |        1 |
|   41 |        2 |
|   42 |        2 |
|   43 |        2 |
+------+----------+
JSON
POST /search -d '
    {
     "index" : "shoes",
     "limit": 0,
     "aggs" :
     {
        "sizes" :
         {
            "terms" :
             {
              "field":"sizes",
              "size":100
             }
         }
     }
    }
'
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 3,
    "hits": [

    ]
  },
  "aggregations": {
    "sizes": {
      "buckets": [
        {
          "key": 43,
          "doc_count": 2
        },
        {
          "key": 42,
          "doc_count": 2
        },
        {
          "key": 41,
          "doc_count": 2
        },
        {
          "key": 40,
          "doc_count": 1
        }
      ]
    }
  }
}
PHP
$index->setName('shoes');
$search = $index->search('');
$search->limit(0);
$search->facet('sizes','sizes',100);
$results = $search->get();
print_r($results->getFacets());
Array
(
    [sizes] => Array
        (
            [buckets] => Array
                (
                    [0] => Array
                        (
                            [key] => 43
                            [doc_count] => 2
                        )
                    [1] => Array
                        (
                            [key] => 42
                            [doc_count] => 2
                        )
                    [2] => Array
                        (
                            [key] => 41
                            [doc_count] => 2
                        )
                    [3] => Array
                        (
                            [key] => 40
                            [doc_count] => 1
                        )
                )
        )
)
Python
res =searchApi.search({"index":"shoes","limit":0,"aggs":{"sizes":{"terms":{"field":"sizes","size":100}}}})
{'aggregations': {u'sizes': {u'buckets': [{u'doc_count': 2, u'key': 43},
                                          {u'doc_count': 2, u'key': 42},
                                          {u'doc_count': 2, u'key': 41},
                                          {u'doc_count': 1, u'key': 40}]}},
 'hits': {'hits': [], 'max_score': None, 'total': 3},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res = await searchApi.search({"index":"shoes","limit":0,"aggs":{"sizes":{"terms":{"field":"sizes","size":100}}}});
{"took":0,"timed_out":false,"aggregations":{"sizes":{"buckets":[{"key":43,"doc_count":2},{"key":42,"doc_count":2},{"key":41,"doc_count":2},{"key":40,"doc_count":1}]}},"hits":{"total":3,"hits":[]}}
Java
HashMap<String,Object> aggs = new HashMap<String,Object>(){{
    put("release_year", new HashMap<String,Object>(){{
        put("terms", new HashMap<String,Object>(){{
            put("field","release_year");
            put("size",100);
        }});
    }});
}};

searchRequest = new SearchRequest();
searchRequest.setIndex("films");        
searchRequest.setLimit(0);
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setAggs(aggs);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    aggregations: {release_year={buckets=[{key=43, doc_count=2}, {key=42, doc_count=2}, {key=41, doc_count=2}, {key=40, doc_count=1}]}}
    hits: class SearchResponseHits {
        maxScore: null
        total: 3
        hits: []
    }
    profile: null
}
C#
var agg = new Aggregation("release_year", "release_year");
agg.Size = 100;
object query = new { match_all=null };
var searchRequest = new SearchRequest("films", query);
searchRequest.Limit = 0;
searchRequest.Aggs = new List<Aggregation> {agg};
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    aggregations: {release_year={buckets=[{key=43, doc_count=2}, {key=42, doc_count=2}, {key=41, doc_count=2}, {key=40, doc_count=1}]}}
    hits: class SearchResponseHits {
        maxScore: null
        total: 3
        hits: []
    }
    profile: null
}
TypeScript
res = await searchApi.search({
  index: 'test',
  aggs: {
    mva_agg: {
      terms: { field: "mva_field", size: 2 }
    }
  }
});
{
    "took":0,
    "timed_out":false,
    "aggregations":
    {
        "mva_agg":
        {
            "buckets":
            [{
                "key":1,
                "doc_count":4
            },
            {
                "key":2,
                "doc_count":2
            }]
        }
    },
    "hits":
    {
        "total":4,
        "hits":[]
    }
}
Go
query := map[string]interface{} {};
searchRequest.SetQuery(query);
aggTerms := manticoreclient.NewAggregationTerms()
aggTerms.SetField("mva_field")
aggTerms.SetSize(2)
aggregation := manticoreclient.NewAggregation()
aggregation.setTerms(aggTerms)
searchRequest.SetAggregation(aggregation)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "aggregations":
    {
        "mva_agg":
        {
            "buckets":
            [{
                "key":1,
                "doc_count":4
            },
            {
                "key":2,
                "doc_count":2
            }]
        }
    },
    "hits":
    {
        "total":5,
        "hits":[]
    }
}

If you have a field of type JSON, you can GROUP BY any node from it. To demonstrate this, let's create a table "products" with a few documents, each having a color in the "meta" JSON field:

create table products(title text, meta json);
insert into products values(0,'nike','{"color":"red"}'),(0,'adidas','{"color":"red"}'),(0,'puma','{"color":"green"}');

This gives us:

SELECT * FROM products;
+---------------------+-------------------+--------+
| id                  | meta              | title  |
+---------------------+-------------------+--------+
| 1657851069130080268 | {"color":"red"}   | nike   |
| 1657851069130080269 | {"color":"red"}   | adidas |
| 1657851069130080270 | {"color":"green"} | puma   |
+---------------------+-------------------+--------+

To group the products by color, we can simply use GROUP BY meta.color, and to display the corresponding group key in the SELECT list, we can use GROUPBY():

SQL
SELECT groupby() color, count(*) from products GROUP BY meta.color;
+-------+----------+
| color | count(*) |
+-------+----------+
| red   |        2 |
| green |        1 |
+-------+----------+
JSON
POST /search -d '
    {
     "index" : "products",
     "limit": 0,
     "aggs" :
     {
        "color" :
         {
            "terms" :
             {
              "field":"meta.color",
              "size":100
             }
         }
     }
    }
'
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 3,
    "hits": [

    ]
  },
  "aggregations": {
    "color": {
      "buckets": [
        {
          "key": "green",
          "doc_count": 1
        },
        {
          "key": "red",
          "doc_count": 2
        }
      ]
    }
  }
}
PHP
$index->setName('products');
$search = $index->search('');
$search->limit(0);
$search->facet('meta.color','color',100);
$results = $search->get();
print_r($results->getFacets());
Array
(
    [color] => Array
        (
            [buckets] => Array
                (
                    [0] => Array
                        (
                            [key] => green
                            [doc_count] => 1
                        )
                    [1] => Array
                        (
                            [key] => red
                            [doc_count] => 2
                        )
                )
        )
)
Python
res =searchApi.search({"index":"products","limit":0,"aggs":{"color":{"terms":{"field":"meta.color","size":100}}}})
{'aggregations': {u'color': {u'buckets': [{u'doc_count': 1,
                                           u'key': u'green'},
                                          {u'doc_count': 2, u'key': u'red'}]}},
 'hits': {'hits': [], 'max_score': None, 'total': 3},
 'profile': None,
 'timed_out': False,
 'took': 0}
Javascript
res = await searchApi.search({"index":"products","limit":0,"aggs":{"color":{"terms":{"field":"meta.color","size":100}}}});
{"took":0,"timed_out":false,"aggregations":{"color":{"buckets":[{"key":"green","doc_count":1},{"key":"red","doc_count":2}]}},"hits":{"total":3,"hits":[]}}
Java
HashMap<String,Object> aggs = new HashMap<String,Object>(){{
    put("color", new HashMap<String,Object>(){{
        put("terms", new HashMap<String,Object>(){{
            put("field","meta.color");
            put("size",100);
        }});
    }});
}};

searchRequest = new SearchRequest();
searchRequest.setIndex("products");        
searchRequest.setLimit(0);
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setAggs(aggs);
searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    aggregations: {color={buckets=[{key=green, doc_count=1}, {key=red, doc_count=2}]}}
    hits: class SearchResponseHits {
        maxScore: null
        total: 3
        hits: []
    }
    profile: null
}
C#
var agg = new Aggregation("color", "meta.color");
agg.Size = 100;
object query = new { match_all=null };
var searchRequest = new SearchRequest("products", query);
searchRequest.Limit = 0;
searchRequest.Aggs = new List<Aggregation> {agg};
var searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    aggregations: {color={buckets=[{key=green, doc_count=1}, {key=red, doc_count=2}]}}
    hits: class SearchResponseHits {
        maxScore: null
        total: 3
        hits: []
    }
    profile: null
}
TypeScript
res = await searchApi.search({
  index: 'test',
  aggs: {
    json_agg: {
      terms: { field: "json_field.year", size: 1 }
    }
  }
});
{
    "took":0,
    "timed_out":false,
    "aggregations":
    {
        "json_agg":
        {
            "buckets":
            [{
                "key":2000,
                "doc_count":2
            },
            {
                "key":2001,
                "doc_count":2
            }]
        }
    },
    "hits":
    {
        "total":4,
        "hits":[]
    }
}
Go
query := map[string]interface{} {};
searchRequest.SetQuery(query);
aggTerms := manticoreclient.NewAggregationTerms()
aggTerms.SetField("json_field.year")
aggTerms.SetSize(2)
aggregation := manticoreclient.NewAggregation()
aggregation.setTerms(aggTerms)
searchRequest.SetAggregation(aggregation)
res, _, _ := apiClient.SearchAPI.Search(context.Background()).SearchRequest(*searchRequest).Execute()
{
    "took":0,
    "timed_out":false,
    "aggregations":
    {
        "json_agg":
        {
            "buckets":
            [{
                "key":2000,
                "doc_count":2
            },
            {
                "key":2001,
                "doc_count":2
            }]
        }
    },
    "hits":
    {
        "total":4,
        "hits":[]
    }
}

Aggregation functions

Besides COUNT(*), which returns the number of elements in each group, you can use various other aggregation functions:

While COUNT(*) returns the number of all elements in the group, COUNT(DISTINCT field) returns the number of unique values of the field in the group, which may be completely different from the total count. For instance, you can have 100 elements in the group, but all with the same value for a certain field. COUNT(DISTINCT field) helps to determine that. To demonstrate this, let's create a table "students" with the student's name, age, and major:

CREATE TABLE students(name text, age int, major string);
INSERT INTO students values(0,'John',21,'arts'),(0,'William',22,'business'),(0,'Richard',21,'cs'),(0,'Rebecca',22,'cs'),(0,'Monica',21,'arts');

so we have:

MySQL [(none)]> SELECT * from students;
+---------------------+------+----------+---------+
| id                  | age  | major    | name    |
+---------------------+------+----------+---------+
| 1657851069130080271 |   21 | arts     | John    |
| 1657851069130080272 |   22 | business | William |
| 1657851069130080273 |   21 | cs       | Richard |
| 1657851069130080274 |   22 | cs       | Rebecca |
| 1657851069130080275 |   21 | arts     | Monica  |
+---------------------+------+----------+---------+

In the example, you can see that if we GROUP BY major and display both COUNT(*) and COUNT(DISTINCT age), it becomes clear that there are two students who chose the major "cs" with two unique ages, but for the major "arts", there are also two students, yet only one unique age.

There can be at most one COUNT(DISTINCT) per query.

** By default, counts are approximate **

Actually, some of them are exact, while others are approximate. More on that below.

Manticore supports two algorithms for computing counts of distinct values. One is a legacy algorithm that uses a lot of memory and is usually slow. It collects {group; value} pairs, sorts them, and periodically discards duplicates. The benefit of this approach is that it guarantees exact counts within a plain table. You can enable it by setting the distinct_precision_threshold option to 0.

The other algorithm (enabled by default) loads counts into a hash table and returns its size. If the hash table becomes too large, its contents are moved into a HyperLogLog. This is where the counts become approximate since HyperLogLog is a probabilistic algorithm. The advantage is that the maximum memory usage per group is fixed and depends on the accuracy of the HyperLogLog. The overall memory usage also depends on the max_matches setting, which reflects the number of groups.

The distinct_precision_threshold option sets the threshold below which counts are guaranteed to be exact. The HyperLogLog accuracy setting and the threshold for the "hash table to HyperLogLog" conversion are derived from this setting. It's important to use this option with caution because doubling it will double the maximum memory required for count calculations. The maximum memory usage can be roughly estimated using this formula: 64 * max_matches * distinct_precision_threshold. Note that this is the worst-case scenario, and in most cases, count calculations will use significantly less RAM.

COUNT(DISTINCT) against a distributed table or a real-time table consisting of multiple disk chunks may return inaccurate results, but the result should be accurate for a distributed table consisting of local plain or real-time tables with the same schema (identical set/order of fields, but may have different tokenization settings).

SQL
SELECT major, count(*), count(distinct age) FROM students GROUP BY major;
+----------+----------+---------------------+
| major    | count(*) | count(distinct age) |
+----------+----------+---------------------+
| arts     |        2 |                   1 |
| business |        1 |                   1 |
| cs       |        2 |                   2 |
+----------+----------+---------------------+

Often, you want to better understand the contents of each group. You can use GROUP N BY for that, but it would return additional rows you might not want in the output. GROUP_CONCAT() enriches your grouping by concatenating values of a specific field in the group. Let's take the previous example and improve it by displaying all the ages in each group.

GROUP_CONCAT(field) returns the list as comma-separated values.

SQL
SELECT major, count(*), count(distinct age), group_concat(age) FROM students GROUP BY major
+----------+----------+---------------------+-------------------+
| major    | count(*) | count(distinct age) | group_concat(age) |
+----------+----------+---------------------+-------------------+
| arts     |        2 |                   1 | 21,21             |
| business |        1 |                   1 | 22                |
| cs       |        2 |                   2 | 21,22             |
+----------+----------+---------------------+-------------------+

Of course, you can also obtain the sum, average, minimum, and maximum values within a group.

SQL
SELECT release_year year, sum(rental_rate) sum, min(rental_rate) min, max(rental_rate) max, avg(rental_rate) avg FROM films GROUP BY release_year ORDER BY year asc LIMIT 5;
+------+------------+----------+----------+------------+
| year | sum        | min      | max      | avg        |
+------+------------+----------+----------+------------+
| 2000 | 308.030029 | 0.990000 | 4.990000 | 3.17556739 |
| 2001 | 282.090118 | 0.990000 | 4.990000 | 3.09989142 |
| 2002 | 332.919983 | 0.990000 | 4.990000 | 3.08259249 |
| 2003 | 310.940063 | 0.990000 | 4.990000 | 2.93339682 |
| 2004 | 300.920044 | 0.990000 | 4.990000 | 2.78629661 |
+------+------------+----------+----------+------------+

Grouping accuracy

Grouping is done in fixed memory, which depends on the max_matches setting. If max_matches allows for storage of all found groups, the results will be 100% accurate. However, if the value of max_matches is lower, the results will be less accurate.

When parallel processing is involved, it can become more complicated. When pseudo_sharding is enabled and/or when using an RT table with several disk chunks, each chunk or pseudo shard gets a result set that is no larger than max_matches. This can lead to inaccuracies in aggregates and group counts when the result sets from different threads are merged. To fix this, either a larger max_matches value or disabling parallel processing can be used.

Manticore will try to increase max_matches up to max_matches_increase_threshold if it detects that groupby may return inaccurate results. Detection is based on the number of unique values of the groupby attribute, which is retrieved from secondary indexes (if present).

To ensure accurate aggregates and/or group counts when using RT tables or pseudo_sharding, accurate_aggregation can be enabled. This will try to increase max_matches up to the threshold, and if the threshold is not high enough, Manticore will disable parallel processing for the query.

SQL
MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5;
+------+----------+
| year | count(*) |
+------+----------+
| 2004 |      108 |
| 2002 |      108 |
| 2001 |       91 |
| 2005 |       93 |
| 2000 |       97 |
+------+----------+

MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5 option max_matches=1;
+------+----------+
| year | count(*) |
+------+----------+
| 2004 |       76 |
+------+----------+

MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5 option max_matches=2;
+------+----------+
| year | count(*) |
+------+----------+
| 2004 |       76 |
| 2002 |       74 |
+------+----------+

MySQL [(none)]> SELECT release_year year, count(*) FROM films GROUP BY year limit 5 option max_matches=3;
+------+----------+
| year | count(*) |
+------+----------+
| 2004 |      108 |
| 2002 |      108 |
| 2001 |       91 |
+------+----------+
¶ 13.16

Percolate Query

Percolate queries are also known as Persistent queries, Prospective search, document routing, search in reverse, and inverse search.

The traditional way of conducting searches involves storing documents and performing search queries against them. However, there are cases where we want to apply a query to a newly incoming document to signal a match. Some scenarios where this is desired include monitoring systems that collect data and notify users about specific events, such as reaching a certain threshold for a metric or a particular value appearing in the monitored data. Another example is news aggregation, where users may want to be notified only about certain categories or topics, or even specific "keywords."

In these situations, traditional search is not the best fit, as it assumes the desired search is performed over the entire collection. This process gets multiplied by the number of users, resulting in many queries running over the entire collection, which can cause significant additional load. The alternative approach described in this section involves storing the queries instead and testing them against an incoming new document or a batch of documents.

Google Alerts, AlertHN, Bloomberg Terminal, and other systems that allow users to subscribe to specific content utilize similar technology.

Performing a percolate query with CALL PQ

The key thing to remember about percolate queries is that your search queries are already in the table. What you need to provide are documents to check if any of them match any of the stored rules.

You can perform a percolate query via SQL or JSON interfaces, as well as using programming language clients. The SQL approach offers more flexibility, while the HTTP method is simpler and provides most of what you need. The table below can help you understand the differences.

Desired Behavior SQL HTTP PHP
Provide a single document CALL PQ('tbl', '{doc1}') query.percolate.document{doc1} $client->pq()->search([$percolate])
Provide a single document (alternative) CALL PQ('tbl', 'doc1', 0 as docs_json) -
Provide multiple documents CALL PQ('tbl', ('doc1', 'doc2'), 0 as docs_json) query.percolate.documents[{doc1}, {doc2}] $client->pq()->search([$percolate])
Provide multiple documents (alternative) CALL PQ('tbl', ('{doc1}', '{doc2}')) - -
Provide multiple documents (alternative) CALL PQ('tbl', '[{doc1}, {doc2}]') - -
Return matching document ids 0/1 as docs (disabled by default) Enabled by default Enabled by default
Use document's own id to show in the result 'id field' as docs_id (disabled by default) Not available Not available
Consider input documents are JSON 1 as docs_json (1 by default) Enabled by default Enabled by default
Consider input documents are plain text 0 as docs_json (1 by default) Not available Not available
Sparsed distribution mode default default default
Sharded distribution mode sharded as mode Not available Not available
Return all info about matching query 1 as query (0 by default) Enabled by default Enabled by default
Skip invalid JSON 1 as skip_bad_json (0 by default) Not available Not available
Extended info in SHOW META 1 as verbose (0 by default) Not available Not available
Define the number which will be added to document ids if no docs_id fields provided (mostly relevant in distributed PQ modes) 1 as shift (0 by default) Not available Not available

To demonstrate how this works, here are a few examples. Let's create a PQ table with two fields:

and three rules in it:

SQL

SQL
CREATE TABLE products(title text, color string) type='pq';
INSERT INTO products(query) values('@title bag');
INSERT INTO products(query,filters) values('@title shoes', 'color=\'red\'');
INSERT INTO products(query,filters) values('@title shoes', 'color in (\'blue\', \'green\')');
select * from products;
+---------------------+--------------+------+---------------------------+
| id                  | query        | tags | filters                   |
+---------------------+--------------+------+---------------------------+
| 1657852401006149635 | @title shoes |      | color IN ('blue, 'green') |
| 1657852401006149636 | @title shoes |      | color='red'               |
| 1657852401006149637 | @title bag   |      |                           |
+---------------------+--------------+------+---------------------------+
JSON
PUT /pq/products/doc/
{
  "query": {
    "match": {
      "title": "bag"
    }
  },
  "filters": ""
}

PUT /pq/products/doc/
{
  "query": {
    "match": {
      "title": "shoes"
    }
  },
  "filters": "color='red'"
}

PUT /pq/products/doc/
{
  "query": {
    "match": {
      "title": "shoes"
    }
  },
  "filters": "color IN ('blue', 'green')"
}
#### JSON
{
  "index": "products",
  "type": "doc",
  "_id": "1657852401006149661",
  "result": "created"
}
{
  "index": "products",
  "type": "doc",
  "_id": "1657852401006149662",
  "result": "created"
}
{
  "index": "products",
  "type": "doc",
  "_id": "1657852401006149663",
  "result": "created"
}
#### PHP
PHP
$index = [
    'index' => 'products',
    'body' => [
        'columns' => [
            'title' => ['type' => 'text'],
            'color' => ['type' => 'string']
        ],
        'settings' => [
            'type' => 'pq'
        ]
    ]
];
$client->indices()->create($index);

$query = [
    'index' => 'products',
    'body' => [ 'query'=>['match'=>['title'=>'bag']]]
];
$client->pq()->doc($query);
$query = [
    'index' => 'products',
    'body' => [ 'query'=>['match'=>['title'=>'shoes']],'filters'=>"color='red'"]
];
$client->pq()->doc($query);


$query = [
    'index' => 'products',
    'body' => [ 'query'=>['match'=>['title'=>'shoes']],'filters'=>"color IN ('blue', 'green')"]
];
$client->pq()->doc($query);
Array(
  [index] => products
  [type] => doc
  [_id] => 1657852401006149661
  [result] => created
)
Array(
  [index] => products
  [type] => doc
  [_id] => 1657852401006149662
  [result] => created
)
Array(
  [index] => products
  [type] => doc
  [_id] => 1657852401006149663
  [result] => created
)
Python
Python
utilsApi.sql('create table products(title text, color string) type=\'pq\'')
indexApi.insert({"index" : "products", "doc" : {"query" : "@title bag" }})
indexApi.insert({"index" : "products",  "doc" : {"query" : "@title shoes", "filters": "color='red'" }})
indexApi.insert({"index" : "products",  "doc" : {"query" : "@title shoes","filters": "color IN ('blue', 'green')" }})
{'created': True,
 'found': None,
 'id': 0,
 'index': 'products',
 'result': 'created'}
{'created': True,
 'found': None,
 'id': 0,
 'index': 'products',
 'result': 'created'}
{'created': True,
 'found': None,
 'id': 0,
 'index': 'products',
 'result': 'created'}
javascript
javascript
res = await utilsApi.sql('create table products(title text, color string) type=\'pq\'');
res = indexApi.insert({"index" : "products", "doc" : {"query" : "@title bag" }});
res = indexApi.insert({"index" : "products",  "doc" : {"query" : "@title shoes", "filters": "color='red'" }});
res = indexApi.insert({"index" : "products",  "doc" : {"query" : "@title shoes","filters": "color IN ('blue', 'green')" }});
"_index":"products","_id":0,"created":true,"result":"created"}
{"_index":"products","_id":0,"created":true,"result":"created"}
{"_index":"products","_id":0,"created":true,"result":"created"}
java
Java
utilsApi.sql("create table products(title text, color string) type='pq'");
doc = new HashMap<String,Object>(){{
    put("query", "@title bag");
}};
newdoc = new InsertDocumentRequest();
newdoc.index("products").setDoc(doc);
indexApi.insert(newdoc);

doc = new HashMap<String,Object>(){{
    put("query", "@title shoes");
    put("filters", "color='red'");
}};
newdoc = new InsertDocumentRequest();
newdoc.index("products").setDoc(doc);
indexApi.insert(newdoc);

doc = new HashMap<String,Object>(){{
    put("query", "@title shoes");
    put("filters", "color IN ('blue', 'green')");
}};
newdoc = new InsertDocumentRequest();
newdoc.index("products").setDoc(doc);
indexApi.insert(newdoc);
{total=0, error=, warning=}
class SuccessResponse {
    index: products
    id: 0
    created: true
    result: created
    found: null
}
class SuccessResponse {
    index: products
    id: 0
    created: true
    result: created
    found: null
}
class SuccessResponse {
    index: products
    id: 0
    created: true
    result: created
    found: null
}
C#
C#
utilsApi.Sql("create table products(title text, color string) type='pq'");

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("query", "@title bag");
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", doc: doc);
indexApi.Insert(newdoc);

doc = new Dictionary<string, Object>(); 
doc.Add("query", "@title shoes");
doc.Add("filters", "color='red'");
newdoc = new InsertDocumentRequest(index: "products", doc: doc);
indexApi.Insert(newdoc);

doc = new Dictionary<string, Object>(); 
doc.Add("query", "@title bag");
doc.Add("filters", "color IN ('blue', 'green')");
newdoc = new InsertDocumentRequest(index: "products", doc: doc);
indexApi.Insert(newdoc);
{total=0, error="", warning=""}

class SuccessResponse {
    index: products
    id: 0
    created: true
    result: created
    found: null
}
class SuccessResponse {
    index: products
    id: 0
    created: true
    result: created
    found: null
}
class SuccessResponse {
    index: products
    id: 0
    created: true
    result: created
    found: null
}
TypeScript
TypeScript
res = await utilsApi.sql("create table test_pq(title text, color string) type='pq'");
res = indexApi.insert({
  index: 'test_pq',
  doc: { query : '@title bag' }
});
res = indexApi.insert(
  index: 'test_pq',
  doc: { query: '@title shoes', filters: "color='red'" }
});
res = indexApi.insert({
  index: 'test_pq',
  doc: { query : '@title shoes', filters: "color IN ('blue', 'green')" }
});
{
    "_index":"test_pq",
    "_id":1657852401006149661,
    "created":true,
    "result":"created"
}
{
    "_index":"test_pq",
    "_id":1657852401006149662,
    "created":true,
    "result":"created"
}
{
    "_index":"test_pq",
    "_id":1657852401006149663,
    "created":true,
    "result":"created"
}
Go
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("create table test_pq(title text, color string) type='pq'").Execute()

indexDoc := map[string]interface{} {"query": "@title bag"}
indexReq := manticoreclient.NewInsertDocumentRequest("test_pq", indexDoc)
apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*indexReq).Execute();

indexDoc = map[string]interface{} {"query": "@title shoes", "filters": "color='red'"}
indexReq = manticoreclient.NewInsertDocumentRequest("test_pq", indexDoc)
apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*indexReq).Execute();

indexDoc = map[string]interface{} {"query": "@title shoes", "filters": "color IN ('blue', 'green')"}
indexReq = manticoreclient.NewInsertDocumentRequest("test_pq", indexDoc)
apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*indexReq).Execute();
{
    "_index":"test_pq",
    "_id":1657852401006149661,
    "created":true,
    "result":"created"
}
{
    "_index":"test_pq",
    "_id":1657852401006149662,
    "created":true,
    "result":"created"
}
{
    "_index":"test_pq",
    "_id":1657852401006149663,
    "created":true,
    "result":"created"
}

The first document doesn't match any rules. It could match the first two, but they require additional filters.

The second document matches one rule. Note that CALL PQ by default expects a document to be a JSON, but if you use 0 as docs_json, you can pass a plain string instead.

SQL:

SQL
CALL PQ('products', 'Beautiful shoes', 0 as docs_json);

CALL PQ('products', 'What a nice bag', 0 as docs_json);
CALL PQ('products', '{"title": "What a nice bag"}');
+---------------------+
| id                  |
+---------------------+
| 1657852401006149637 |
+---------------------+

+---------------------+
| id                  |
+---------------------+
| 1657852401006149637 |
+---------------------+
JSON:
JSON
POST /pq/products/_search
{
  "query": {
    "percolate": {
      "document": {
        "title": "What a nice bag"
      }
    }
  }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "1657852401006149644",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
PHP:
PHP
$percolate = [
    'index' => 'products',
    'body' => [
        'query' => [
            'percolate' => [
                'document' => [
                    'title' => 'What a nice bag'
                ]
            ]
        ]
    ]
];
$client->pq()->search($percolate);
Array
(
    [took] => 0
    [timed_out] =>
    [hits] => Array
        (
            [total] => 1
            [max_score] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => products
                            [_type] => doc
                            [_id] => 1657852401006149644
                            [_score] => 1
                            [_source] => Array
                                (
                                    [query] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => bag
                                                )
                                        )
                                )
                            [fields] => Array
                                (
                                    [_percolator_document_slot] => Array
                                        (
                                            [0] => 1
                                        )
                                )
                        )
                )
        )
)
Python
Python
searchApi.percolate('products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}})
{'hits': {'hits': [{u'_id': u'2811025403043381480',
                    u'_index': u'products',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'@title bag'}},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}}],
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.percolate('products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}});
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "2811045522851233808",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
java
Java
PercolateRequest percolateRequest = new PercolateRequest();
query = new HashMap<String,Object>(){{
    put("percolate",new HashMap<String,Object >(){{
        put("document", new HashMap<String,Object >(){{
            put("title","what a nice bag");
        }});
    }});
}};
percolateRequest.query(query);
searchApi.percolate("test_pq",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234109, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
C#
C#
Dictionary<string, Object> percolateDoc = new Dictionary<string, Object>(); 
percolateDoc.Add("document", new Dictionary<string, Object> {{ "title", "what a nice bag" }});
Dictionary<string, Object> query = new Dictionary<string, Object> {{ "percolate", percolateDoc }}; 
PercolateRequest percolateRequest = new PercolateRequest(query=query);
searchApi.Percolate("test_pq",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234109, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
TypeScript
res = await searchApi.percolate('test_pq', { query: { percolate: { document : { title : 'What a nice bag' } } } } );
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
Go
Go
query := map[string]interface{} {"title": "what a nice bag"}
percolateRequestQuery := manticoreclient.NewPercolateQuery(query)
percolateRequest := manticoreclient.NewPercolateRequest(percolateRequestQuery)
res, _, _ := apiClient.SearchAPI.Percolate(context.Background(), "test_pq").PercolateRequest(*percolateRequest).Execute()
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}

SQL:

SQL
CALL PQ('products', '{"title": "What a nice bag"}', 1 as query);
+---------------------+------------+------+---------+
| id                  | query      | tags | filters |
+---------------------+------------+------+---------+
| 1657852401006149637 | @title bag |      |         |
+---------------------+------------+------+---------+
JSON:
JSON
POST /pq/products/_search
{
  "query": {
    "percolate": {
      "document": {
        "title": "What a nice bag"
      }
    }
  }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "1657852401006149644",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
PHP:
PHP
$percolate = [
    'index' => 'products',
    'body' => [
        'query' => [
            'percolate' => [
                'document' => [
                    'title' => 'What a nice bag'
                ]
            ]
        ]
    ]
];
$client->pq()->search($percolate);
Array
(
    [took] => 0
    [timed_out] =>
    [hits] => Array
        (
            [total] => 1
            [max_score] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => products
                            [_type] => doc
                            [_id] => 1657852401006149644
                            [_score] => 1
                            [_source] => Array
                                (
                                    [query] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => bag
                                                )
                                        )
                                )
                            [fields] => Array
                                (
                                    [_percolator_document_slot] => Array
                                        (
                                            [0] => 1
                                        )
                                )
                        )
                )
        )
)
Python
Python
searchApi.percolate('products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}})
{'hits': {'hits': [{u'_id': u'2811025403043381480',
                    u'_index': u'products',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'@title bag'}},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}}],
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.percolate('products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}});
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "2811045522851233808",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
java
Java
PercolateRequest percolateRequest = new PercolateRequest();
query = new HashMap<String,Object>(){{
    put("percolate",new HashMap<String,Object >(){{
        put("document", new HashMap<String,Object >(){{
            put("title","what a nice bag");
        }});
    }});
}};
percolateRequest.query(query);
searchApi.percolate("test_pq",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234109, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
C#
C#
Dictionary<string, Object> percolateDoc = new Dictionary<string, Object>(); 
percolateDoc.Add("document", new Dictionary<string, Object> {{ "title", "what a nice bag" }});
Dictionary<string, Object> query = new Dictionary<string, Object> {{ "percolate", percolateDoc }}; 
PercolateRequest percolateRequest = new PercolateRequest(query=query);
searchApi.Percolate("test_pq",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234109, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
TypeScript
res = await searchApi.percolate('test_pq', { query: { percolate: { document : { title : 'What a nice bag' } } } } );
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
Go
Go
query := map[string]interface{} {"title": "what a nice bag"}
percolateRequestQuery := manticoreclient.NewPercolateQuery(query)
percolateRequest := manticoreclient.NewPercolateRequest(percolateRequestQuery)
res, _, _ := apiClient.SearchAPI.Percolate(context.Background(), "test_pq").PercolateRequest(*percolateRequest).Execute()
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}

Note that with CALL PQ, you can provide multiple documents in different ways:

SQL:

SQL
CALL PQ('products', ('nice pair of shoes', 'beautiful bag'), 1 as query, 0 as docs_json);

CALL PQ('products', ('{"title": "nice pair of shoes", "color": "red"}', '{"title": "beautiful bag"}'), 1 as query);

CALL PQ('products', '[{"title": "nice pair of shoes", "color": "blue"}, {"title": "beautiful bag"}]', 1 as query);
+---------------------+------------+------+---------+
| id                  | query      | tags | filters |
+---------------------+------------+------+---------+
| 1657852401006149637 | @title bag |      |         |
+---------------------+------------+------+---------+

+---------------------+--------------+------+-------------+
| id                  | query        | tags | filters     |
+---------------------+--------------+------+-------------+
| 1657852401006149636 | @title shoes |      | color='red' |
| 1657852401006149637 | @title bag   |      |             |
+---------------------+--------------+------+-------------+

+---------------------+--------------+------+---------------------------+
| id                  | query        | tags | filters                   |
+---------------------+--------------+------+---------------------------+
| 1657852401006149635 | @title shoes |      | color IN ('blue, 'green') |
| 1657852401006149637 | @title bag   |      |                           |
+---------------------+--------------+------+---------------------------+
JSON:
JSON
POST /pq/products/_search
{
  "query": {
    "percolate": {
      "documents": [
        {"title": "nice pair of shoes", "color": "blue"},
        {"title": "beautiful bag"}
      ]
    }
  }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "1657852401006149644",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            2
          ]
        }
      },
      {
        "_index": "products",
        "_type": "doc",
        "_id": "1657852401006149646",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
PHP:
PHP
$percolate = [
    'index' => 'products',
    'body' => [
        'query' => [
            'percolate' => [
                'documents' => [
                    ['title' => 'nice pair of shoes','color'=>'blue'],
                    ['title' => 'beautiful bag']
                ]
            ]
        ]
    ]
];
$client->pq()->search($percolate);
Array
(
    [took] => 23
    [timed_out] =>
    [hits] => Array
        (
            [total] => 2
            [max_score] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => products
                            [_type] => doc
                            [_id] => 2810781492890828819
                            [_score] => 1
                            [_source] => Array
                                (
                                    [query] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => bag
                                                )
                                        )
                                )
                            [fields] => Array
                                (
                                    [_percolator_document_slot] => Array
                                        (
                                            [0] => 2
                                        )
                                )
                        )
                    [1] => Array
                        (
                            [_index] => products
                            [_type] => doc
                            [_id] => 2810781492890828821
                            [_score] => 1
                            [_source] => Array
                                (
                                    [query] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => shoes
                                                )
                                        )
                                )
                            [fields] => Array
                                (
                                    [_percolator_document_slot] => Array
                                        (
                                            [0] => 1
                                        )
                                )
                        )
                )
        )
)
Python
Python
searchApi.percolate('products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}})
{'hits': {'hits': [{u'_id': u'2811025403043381494',
                    u'_index': u'products',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'@title bag'}},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [2]}},
                   {u'_id': u'2811025403043381496',
                    u'_index': u'products',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'@title shoes'}},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}}],
          'total': 2},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.percolate('products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}});
{
  "took": 6,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "2811045522851233808",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            2
          ]
        }
      },
      {
        "_index": "products",
        "_type": "doc",
        "_id": "2811045522851233810",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
java
Java
percolateRequest = new PercolateRequest();
query = new HashMap<String,Object>(){{
        put("percolate",new HashMap<String,Object >(){{
            put("documents", new ArrayList<Object>(){{
                    add(new HashMap<String,Object >(){{
                        put("title","nice pair of shoes");
                        put("color","blue");
                    }});
                    add(new HashMap<String,Object >(){{
                        put("title","beautiful bag");

                    }});

                     }});
        }});
    }};
percolateRequest.query(query);
searchApi.percolate("products",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234133, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[2]}}, {_index=products, _type=doc, _id=2811045522851234135, _score=1, _source={query={ql=@title shoes}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
C#
C#
var doc1 = new Dictionary<string, Object>();
doc1.Add("title","nice pair of shoes");
doc1.Add("color","blue");
var doc2 = new Dictionary<string, Object>();
doc2.Add("title","beautiful bag");
var docs = new List<Object> {doc1, doc2};
Dictionary<string, Object> percolateDoc = new Dictionary<string, Object> {{ "documents", docs }}; 
Dictionary<string, Object> query = new Dictionary<string, Object> {{ "percolate", percolateDoc }}; 
PercolateRequest percolateRequest = new PercolateRequest(query=query);
searchApi.Percolate("products",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234133, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[2]}}, {_index=products, _type=doc, _id=2811045522851234135, _score=1, _source={query={ql=@title shoes}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
TypeScript
docs = [ {title : 'What a nice bag'}, {title : 'Really nice shoes'} ]; 
res = await searchApi.percolate('test_pq', { query: { percolate: { documents : docs } } } );
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      },
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149662",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
Go
Go
doc1 := map[string]interface{} {"title": "What a nice bag"}
doc2 := map[string]interface{} {"title": "Really nice shoes"}
query := []interface{} {doc1, doc2}
percolateRequestQuery := manticoreclient.NewPercolateQuery(query)
percolateRequest := manticoreclient.NewPercolateRequest(percolateRequestQuery)
res, _, _ := apiClient.SearchAPI.Percolate(context.Background(), "test_pq").PercolateRequest(*percolateRequest).Execute()
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      },
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149662",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}

Using the option 1 as docs allows you to see which documents of the provided ones match which rules.

SQL:

SQL
CALL PQ('products', '[{"title": "nice pair of shoes", "color": "blue"}, {"title": "beautiful bag"}]', 1 as query, 1 as docs);
+---------------------+-----------+--------------+------+---------------------------+
| id                  | documents | query        | tags | filters                   |
+---------------------+-----------+--------------+------+---------------------------+
| 1657852401006149635 | 1         | @title shoes |      | color IN ('blue, 'green') |
| 1657852401006149637 | 2         | @title bag   |      |                           |
+---------------------+-----------+--------------+------+---------------------------+
JSON:
JSON
POST /pq/products/_search
{
  "query": {
    "percolate": {
      "documents": [
        {"title": "nice pair of shoes", "color": "blue"},
        {"title": "beautiful bag"}
      ]
    }
  }
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "1657852401006149644",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            2
          ]
        }
      },
      {
        "_index": "products",
        "_type": "doc",
        "_id": "1657852401006149646",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
PHP:
PHP
$percolate = [
    'index' => 'products',
    'body' => [
        'query' => [
            'percolate' => [
                'documents' => [
                    ['title' => 'nice pair of shoes','color'=>'blue'],
                    ['title' => 'beautiful bag']
                ]
            ]
        ]
    ]
];
$client->pq()->search($percolate);
Array
(
    [took] => 23
    [timed_out] =>
    [hits] => Array
        (
            [total] => 2
            [max_score] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => products
                            [_type] => doc
                            [_id] => 2810781492890828819
                            [_score] => 1
                            [_source] => Array
                                (
                                    [query] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => bag
                                                )
                                        )
                                )
                            [fields] => Array
                                (
                                    [_percolator_document_slot] => Array
                                        (
                                            [0] => 2
                                        )
                                )
                        )
                    [1] => Array
                        (
                            [_index] => products
                            [_type] => doc
                            [_id] => 2810781492890828821
                            [_score] => 1
                            [_source] => Array
                                (
                                    [query] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => shoes
                                                )
                                        )
                                )
                            [fields] => Array
                                (
                                    [_percolator_document_slot] => Array
                                        (
                                            [0] => 1
                                        )
                                )
                        )
                )
        )
)
Python
Python
searchApi.percolate('products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}})
{'hits': {'hits': [{u'_id': u'2811025403043381494',
                    u'_index': u'products',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'@title bag'}},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [2]}},
                   {u'_id': u'2811025403043381496',
                    u'_index': u'products',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'@title shoes'}},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}}],
          'total': 2},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.percolate('products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}});
{
  "took": 6,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "products",
        "_type": "doc",
        "_id": "2811045522851233808",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            2
          ]
        }
      },
      {
        "_index": "products",
        "_type": "doc",
        "_id": "2811045522851233810",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
java
Java
percolateRequest = new PercolateRequest();
query = new HashMap<String,Object>(){{
        put("percolate",new HashMap<String,Object >(){{
            put("documents", new ArrayList<Object>(){{
                    add(new HashMap<String,Object >(){{
                        put("title","nice pair of shoes");
                        put("color","blue");
                    }});
                    add(new HashMap<String,Object >(){{
                        put("title","beautiful bag");

                    }});

                     }});
        }});
    }};
percolateRequest.query(query);
searchApi.percolate("products",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234133, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[2]}}, {_index=products, _type=doc, _id=2811045522851234135, _score=1, _source={query={ql=@title shoes}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
C#
C#
var doc1 = new Dictionary<string, Object>();
doc1.Add("title","nice pair of shoes");
doc1.Add("color","blue");
var doc2 = new Dictionary<string, Object>();
doc2.Add("title","beautiful bag");
var docs = new List<Object> {doc1, doc2};
Dictionary<string, Object> percolateDoc = new Dictionary<string, Object> {{ "documents", docs }}; 
Dictionary<string, Object> query = new Dictionary<string, Object> {{ "percolate", percolateDoc }}; 
PercolateRequest percolateRequest = new PercolateRequest(query=query);
searchApi.Percolate("products",percolateRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: 1
        hits: [{_index=products, _type=doc, _id=2811045522851234133, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[2]}}, {_index=products, _type=doc, _id=2811045522851234135, _score=1, _source={query={ql=@title shoes}}, fields={_percolator_document_slot=[1]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
TypeScript
docs = [ {title : 'What a nice bag'}, {title : 'Really nice shoes'} ]; 
res = await searchApi.percolate('test_pq', { query: { percolate: { documents : docs } } } );
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      },
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149662",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
Go
Go
doc1 := map[string]interface{} {"title": "What a nice bag"}
doc2 := map[string]interface{} {"title": "Really nice shoes"}
query := []interface{} {doc1, doc2}
percolateRequestQuery := manticoreclient.NewPercolateQuery(query)
percolateRequest := manticoreclient.NewPercolateRequest(percolateRequestQuery)
res, _, _ := apiClient.SearchAPI.Percolate(context.Background(), "test_pq").PercolateRequest(*percolateRequest).Execute()
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      },
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149662",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}

Static ids

By default, matching document ids correspond to their relative numbers in the list you provide. However, in some cases, each document already has its own id. For this case, there's an option 'id field name' as docs_id for CALL PQ.

Note that if the id cannot be found by the provided field name, the PQ rule will not be shown in the results.

This option is only available for CALL PQ via SQL.

SQL
CALL PQ('products', '[{"id": 123, "title": "nice pair of shoes", "color": "blue"}, {"id": 456, "title": "beautiful bag"}]', 1 as query, 'id' as docs_id, 1 as docs);
+---------------------+-----------+--------------+------+---------------------------+
| id                  | documents | query        | tags | filters                   |
+---------------------+-----------+--------------+------+---------------------------+
| 1657852401006149664 | 456       | @title bag   |      |                           |
| 1657852401006149666 | 123       | @title shoes |      | color IN ('blue, 'green') |
+---------------------+-----------+--------------+------+---------------------------+

When using CALL PQ with separate JSONs, you can use the option 1 as skip_bad_json to skip any invalid JSONs in the input. In the example below, the 2nd query fails due to an invalid JSON, but the 3rd query avoids the error by using 1 as skip_bad_json. Keep in mind that this option is not available when sending JSON queries over HTTP, as the whole JSON query must be valid in that case.

SQL:

SQL
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'));

CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag}'));

CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag}'), 1 as skip_bad_json);
+---------------------+
| id                  |
+---------------------+
| 1657852401006149635 |
| 1657852401006149637 |
+---------------------+

ERROR 1064 (42000): Bad JSON objects in strings: 2

+---------------------+
| id                  |
+---------------------+
| 1657852401006149635 |
+---------------------+
I want higher performance of a percolate query

Percolate queries are designed with high throughput and large data volumes in mind. To optimize performance for lower latency and higher throughput, consider the following.

There are two modes of distribution for a percolate table and how a percolate query can work against it:

Assume you have table pq_d2 defined as:

table pq_d2
{
    type = distributed
    agent = 127.0.0.1:6712:pq
    agent = 127.0.0.1:6712:ptitle
}

Each of 'pq' and 'ptitle' contains:

SQL
SELECT * FROM pq;
+------+-------------+------+-------------------+
| id   | query       | tags | filters           |
+------+-------------+------+-------------------+
|    1 | filter test |      | gid>=10           |
|    2 | angry       |      | gid>=10 OR gid<=3 |
+------+-------------+------+-------------------+
2 rows in set (0.01 sec)
JSON
POST /pq/pq/_search
{
    "took":0,
    "timed_out":false,
    "hits":{
        "total":2,
        "hits":[
            {
                "_id":"1",
                "_score":1,
                "_source":{
                    "query":{ "ql":"filter test" },
                    "tags":"",
                    "filters":"gid>=10"
                }
            },
            {
                "_id":"2",
                "_score":1,
                "_source":{
                    "query":{"ql":"angry"},
                    "tags":"",
                    "filters":"gid>=10 OR gid<=3"
                }
            }            
        ]
    }
}
PHP
$params = [
    'index' => 'pq',
    'body' => [
    ]
];
$response = $client->pq()->search($params);
(
    [took] => 0
    [timed_out] =>
    [hits] =>
        (
            [total] => 2
            [hits] =>
                (
                    [0] =>
                        (
                            [_id] => 1
                            [_score] => 1
                            [_source] =>
                                (
                                    [query] =>
                                        (
                                            [ql] => filter test
                                        )
                                    [tags] =>
                                    [filters] => gid>=10
                                )
                        ),
                    [1] =>
                        (
                            [_id] => 1
                            [_score] => 1
                            [_source] =>
                                (
                                    [query] =>
                                        (
                                            [ql] => angry
                                        )
                                    [tags] =>
                                    [filters] => gid>=10 OR gid<=3
                                )
                        )
                )
        )
)
Python
Python
searchApi.search({"index":"pq","query":{"match_all":{}}})
{'hits': {'hits': [{u'_id': u'2811025403043381501',
                    u'_score': 1,
                    u'_source': {u'filters': u"gid>=10",
                                 u'query': u'filter test',
                                 u'tags': u''}},
                   {u'_id': u'2811025403043381502',
                    u'_score': 1,
                    u'_source': {u'filters': u"gid>=10 OR gid<=3",
                                 u'query': u'angry',
                                 u'tags': u''}}],
          'total': 2},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.search({"index":"pq","query":{"match_all":{}}});
{'hits': {'hits': [{u'_id': u'2811025403043381501',
                    u'_score': 1,
                    u'_source': {u'filters': u"gid>=10",
                                 u'query': u'filter test',
                                 u'tags': u''}},
                   {u'_id': u'2811025403043381502',
                    u'_score': 1,
                    u'_source': {u'filters': u"gid>=10 OR gid<=3",
                                 u'query': u'angry',
                                 u'tags': u''}}],
          'total': 2},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.search({"index":"pq","query":{"match_all":{}}});
{"hits": {"hits": [{"_id": "2811025403043381501",
                    "_score": 1,
                    "_source": {"filters": u"gid>=10",
                                 "query": "filter test",
                                 "tags": ""}},
                   {"_id": "2811025403043381502",
                    "_score": 1,
                    "_source": {"filters": u"gid>=10 OR gid<=3",
                                 "query": "angry",
                                 "tags": ""}}],
          "total": 2},
  "timed_out": false,
 "took": 0}
java
Java
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("pq");
searchRequest.setQuery(query);
SearchResponse searchResponse = searchApi.search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: null
        hits: [{_id=2811045522851233962, _score=1, _source={filters=gid>=10, query=filter test, tags=}}, {_id=2811045522851233951, _score=1, _source={filters=gid>=10 OR gid<=3, query=angry,tags=}}]
        aggregations: null
    }
    profile: null
}
C#
C#
object query =  new { match_all=null };
SearchRequest searchRequest = new SearchRequest("pq", query);
SearchResponse searchResponse = searchApi.Search(searchRequest);
class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: null
        hits: [{_id=2811045522851233962, _score=1, _source={filters=gid>=10, query=filter test, tags=}}, {_id=2811045522851233951, _score=1, _source={filters=gid>=10 OR gid<=3, query=angry,tags=}}]
        aggregations: null
    }
    profile: null
}
TypeScript
TypeScript
res = await searchApi.search({"index":"test_pq","query":{"match_all":{}}});
{
    'hits':
    {
        'hits': 
        [{
            '_id': '2811025403043381501',
            '_score': 1,
            '_source': 
            {
                'filters': "gid>=10",
                'query': 'filter test',
                'tags': ''
            }
        },
        {
            '_id': 
            '2811025403043381502',
            '_score': 1,
            '_source': 
            {
                'filters': "gid>=10 OR gid<=3",
                 'query': 'angry',
                 'tags': ''
            }
        }],
        'total': 2
    },
    'profile': None,
    'timed_out': False,
    'took': 0
}
Go
Go
query := map[string]interface{} {}
percolateRequestQuery := manticoreclient.NewPercolateRequestQuery(query)
percolateRequest := manticoreclient.NewPercolateRequest(percolateRequestQuery) 
res, _, _ := apiClient.SearchAPI.Percolate(context.Background(), "test_pq").PercolateRequest(*percolateRequest).Execute()
{
    'hits':
    {
        'hits': 
        [{
            '_id': '2811025403043381501',
            '_score': 1,
            '_source': 
            {
                'filters': "gid>=10",
                'query': 'filter test',
                'tags': ''
            }
        },
        {
            '_id': 
            '2811025403043381502',
            '_score': 1,
            '_source': 
            {
                'filters': "gid>=10 OR gid<=3",
                 'query': 'angry',
                 'tags': ''
            }
        }],
        'total': 2
    },
    'profile': None,
    'timed_out': False,
    'took': 0
}

And you execute CALL PQ on the distributed table with a couple of documents.

SQL
CALL PQ ('pq_d2', ('{"title":"angry test", "gid":3 }', '{"title":"filter test doc2", "gid":13}'), 1 AS docs);
+------+-----------+
| id   | documents |
+------+-----------+
|    1 | 2         |
|    2 | 1         |
+------+-----------+
JSON
POST /pq/pq/_search -d '
"query":
{
        "percolate":
        {
                "documents" : [
                    { "title": "angry test", "gid": 3 },
                    { "title": "filter test doc2", "gid": 13 }
                ]
        }
}
'
{
    "took":0,
    "timed_out":false,
    "hits":{
    "total":2,"hits":[
        {
            "_id":"2",
            "_score":1,
            "_source":{
                "query":{"title":"angry"},
                "tags":"",
                "filters":"gid>=10 OR gid<=3"
            }
        }
        {
            "_id":"1",
            "_score":1,
            "_source":{
                "query":{"ql":"filter test"},
                "tags":"",
                "filters":"gid>=10"
            }
        },
        ]
    }
}
PHP
$params = [
    'index' => 'pq',
    'body' => [
        'query' => [
            'percolate' => [
                'documents' => [
                    [
                        'title'=>'angry test',
                        'gid' => 3
                    ],
                    [
                        'title'=>'filter test doc2',
                        'gid' => 13
                    ],
                ]
            ]
        ]
    ]
];
$response = $client->pq()->search($params);
(
    [took] => 0
    [timed_out] =>
    [hits] =>
        (
            [total] => 2
            [hits] =>
                (
                    [0] =>
                        (
                            [_index] => pq  
                            [_type] => doc                            
                            [_id] => 2
                            [_score] => 1
                            [_source] =>
                                (
                                    [query] =>
                                        (
                                            [ql] => angry
                                        )
                                    [tags] =>
                                    [filters] => gid>=10 OR gid<=3
                                ),
                           [fields] =>
                                (
                                    [_percolator_document_slot] =>
                                        (
                                            [0] => 1
                                        )

                                )
                        ),
                    [1] =>
                        (
                            [_index] => pq                            
                            [_id] => 1
                            [_score] => 1
                            [_source] =>
                                (
                                    [query] =>
                                        (
                                            [ql] => filter test
                                        )
                                    [tags] =>
                                    [filters] => gid>=10
                                )
                           [fields] =>
                                (
                                    [_percolator_document_slot] =>
                                        (
                                            [0] => 0
                                        )

                                )
                        )
                )
        )
)
Python
Python
searchApi.percolate('pq',{"percolate":{"documents":[{"title":"angry test","gid":3},{"title":"filter test doc2","gid":13}]}})
{'hits': {'hits': [{u'_id': u'2811025403043381480',
                    u'_index': u'pq',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'angry'},u'tags':u'',u'filters':u"gid>=10 OR gid<=3"},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}},
                    {u'_id': u'2811025403043381501',
                    u'_index': u'pq',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'filter test'},u'tags':u'',u'filters':u"gid>=10"},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}}],
          'total': 2},
 'profile': None,
 'timed_out': False,
 'took': 0}
javascript
javascript
res = await searchApi.percolate('pq',{"percolate":{"documents":[{"title":"angry test","gid":3},{"title":"filter test doc2","gid":13}]}});
{'hits': {'hits': [{u'_id': u'2811025403043381480',
                    u'_index': u'pq',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'angry'},u'tags':u'',u'filters':u"gid>=10 OR gid<=3"},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}},
                    {u'_id': u'2811025403043381501',
                    u'_index': u'pq',
                    u'_score': u'1',
                    u'_source': {u'query': {u'ql': u'filter test'},u'tags':u'',u'filters':u"gid>=10"},
                    u'_type': u'doc',
                    u'fields': {u'_percolator_document_slot': [1]}}],
          'total': 2},
 'profile': None,
 'timed_out': False,
 'took': 0}
java
Java
percolateRequest = new PercolateRequest();
query = new HashMap<String,Object>(){{
    put("percolate",new HashMap<String,Object >(){{
        put("documents", new ArrayList<Object>(){{
            add(new HashMap<String,Object >(){{
                put("title","angry test");
                put("gid",3);
            }});
            add(new HashMap<String,Object >(){{
                put("title","filter test doc2");
                put("gid",13);
            }});
        }});
    }});
}};
percolateRequest.query(query);
searchApi.percolate("pq",percolateRequest);
class SearchResponse {
    took: 10
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: 1
        hits: [{_index=pq, _type=doc, _id=2811045522851234165, _score=1, _source={query={ql=@title angry}}, fields={_percolator_document_slot=[1]}}, {_index=pq, _type=doc, _id=2811045522851234166, _score=1, _source={query={ql=@title filter test doc2}}, fields={_percolator_document_slot=[2]}}]
        aggregations: null
    }
    profile: null
}
C#
C#
var doc1 = new Dictionary<string, Object>();
doc1.Add("title","angry test");
doc1.Add("gid",3);
var doc2 = new Dictionary<string, Object>();
doc2.Add("title","filter test doc2");
doc2.Add("gid",13);
var docs = new List<Object> {doc1, doc2};
Dictionary<string, Object> percolateDoc = new Dictionary<string, Object> {{ "documents", docs }}; 
Dictionary<string, Object> query = new Dictionary<string, Object> {{ "percolate", percolateDoc }}; 
PercolateRequest percolateRequest = new PercolateRequest(query=query);
searchApi.Percolate("pq",percolateRequest);
class SearchResponse {
    took: 10
    timedOut: false
    hits: class SearchResponseHits {
        total: 2
        maxScore: 1
        hits: [{_index=pq, _type=doc, _id=2811045522851234165, _score=1, _source={query={ql=@title angry}}, fields={_percolator_document_slot=[1]}}, {_index=pq, _type=doc, _id=2811045522851234166, _score=1, _source={query={ql=@title filter test doc2}}, fields={_percolator_document_slot=[2]}}]
        aggregations: null
    }
    profile: null
}
TypeScript
TypeScript
docs = [ {title : 'What a nice bag'}, {title : 'Really nice shoes'} ]; 
res = await searchApi.percolate('test_pq', { query: { percolate: { documents : docs } } } );
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      },
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149662",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}
Go
Go
doc1 := map[string]interface{} {"title": "What a nice bag"}
doc2 := map[string]interface{} {"title": "Really nice shoes"}
query := []interface{} {doc1, doc2}
percolateRequestQuery := manticoreclient.NewPercolateQuery(query)
percolateRequest := manticoreclient.NewPercolateRequest(percolateRequestQuery)
res, _, _ := apiClient.SearchAPI.Percolate(context.Background(), "test_pq").PercolateRequest(*percolateRequest).Execute()
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149661",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title bag"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      },
      {
        "_index": "test_pq",
        "_type": "doc",
        "_id": "1657852401006149662",
        "_score": "1",
        "_source": {
          "query": {
            "ql": "@title shoes"
          }
        },
        "fields": {
          "_percolator_document_slot": [
            1
          ]
        }
      }
    ]
  }
}

In the previous example, we used the default sparse mode. To demonstrate the sharded mode, let's create a distributed PQ table consisting of 2 local PQ tables and add 2 documents to "products1" and 1 document to "products2":

create table products1(title text, color string) type='pq';
create table products2(title text, color string) type='pq';
create table products_distributed type='distributed' local='products1' local='products2';

INSERT INTO products1(query) values('@title bag');
INSERT INTO products1(query,filters) values('@title shoes', 'color=\'red\'');
INSERT INTO products2(query,filters) values('@title shoes', 'color in (\'blue\', \'green\')');

Now, if you add 'sharded' as mode to CALL PQ, it will send the documents to all the agent's tables (in this case, just local tables, but they can be remote to utilize external hardware). This mode is not available via the JSON interface.

SQL:

SQL
CALL PQ('products_distributed', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 'sharded' as mode, 1 as query);
+---------------------+--------------+------+---------------------------+
| id                  | query        | tags | filters                   |
+---------------------+--------------+------+---------------------------+
| 1657852401006149639 | @title bag   |      |                           |
| 1657852401006149643 | @title shoes |      | color IN ('blue, 'green') |
+---------------------+--------------+------+---------------------------+

Note that the syntax of agent mirrors in the configuration (when several hosts are assigned to one agent line, separated with |) has nothing to do with the CALL PQ query mode. Each agent always represents one node, regardless of the number of HA mirrors specified for that agent.

In some cases, you might want to get more details about the performance of a percolate query. For that purpose, there is the option 1 as verbose, which is only available via SQL and allows you to save more performance metrics. You can see them using the SHOW META query, which you can run after CALL PQ. See SHOW META for more info.

1 as verbose:

1 as verbose
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 1 as verbose); show meta;
+---------------------+
| id                  |
+---------------------+
| 1657852401006149644 |
| 1657852401006149646 |
+---------------------+
+-------------------------+-----------+
| Name                    | Value     |
+-------------------------+-----------+
| Total                   | 0.000 sec |
| Setup                   | 0.000 sec |
| Queries matched         | 2         |
| Queries failed          | 0         |
| Document matched        | 2         |
| Total queries stored    | 3         |
| Term only queries       | 3         |
| Fast rejected queries   | 0         |
| Time per query          | 27, 10    |
| Time of matched queries | 37        |
+-------------------------+-----------+
0 as verbose (default):
0 as verbose
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 0 as verbose); show meta;
+---------------------+
| id                  |
+---------------------+
| 1657852401006149644 |
| 1657852401006149646 |
+---------------------+
+-----------------------+-----------+
| Name                  | Value     |
+-----------------------+-----------+
| Total                 | 0.000 sec |
| Queries matched       | 2         |
| Queries failed        | 0         |
| Document matched      | 2         |
| Total queries stored  | 3         |
| Term only queries     | 3         |
| Fast rejected queries | 0         |
+-----------------------+-----------+
¶ 13.17

Autocomplete

Autocomplete (or word completion) is a feature in which an application predicts the rest of a word a user is typing. On websites, it's used in search boxes, where a user starts to type a word, and a dropdown with suggestions pops up so the user can select the ending from the list.

Autocomplete

There are a few ways you can do autocomplete in Manticore:

Autocomplete a sentence

To autocomplete a sentence, you can use infixed search. You can find endings of a document's field by providing its beginning and:

There is an article about it in our blog and an interactive course. A quick example is:

Autocomplete a word

In some cases, all you need is to autocomplete a single word or a couple of words. In this case, you can use CALL KEYWORDS.

CALL KEYWORDS

CALL KEYWORDS is available through the SQL interface and offers a way to examine how keywords are tokenized or to obtain the tokenized forms of specific keywords. If the table enables infixes, it allows you to quickly find possible endings for given keywords, making it suitable for autocomplete functionality.

This is a great alternative to general infixed search, as it provides higher performance since it only needs the table's dictionary, not the documents themselves.

General syntax

CALL KEYWORDS(text, table [, options])

The CALL KEYWORDS statement divides text into keywords. It returns the tokenized and normalized forms of the keywords, and if desired, keyword statistics. Additionally, it provides the position of each keyword in the query and all forms of tokenized keywords when the table enables lemmatizers.

Parameter Description
text Text to break down to keywords
table Name of the table from which to take the text processing settings
0/1 as stats Show statistics of keywords, default is 0
0/1 as fold_wildcards Fold wildcards, default is 0
0/1 as fold_lemmas Fold morphological lemmas, default is 0
0/1 as fold_blended Fold blended words, default is 0
N as expansion_limit Override expansion_limit defined in the server configuration, default is 0 (use value from the configuration)
docs/hits as sort_mode Sort output results by either 'docs' or 'hits'. Default no sorting

The examples show how it works if assuming the user is trying to get an autocomplete for "my cat ...". So on the application side all you need to do is to suggest the user the endings from the column "normalized" for each new word. It often makes sense to sort by hits or docs using 'hits' as sort_mode or 'docs' as sort_mode.

Examples
MySQL [(none)]> CALL KEYWORDS('m*', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | m*        | my         | 1    | 2    |
| 1    | m*        | mammal     | 1    | 1    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('my*', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | my*       | my         | 1    | 2    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('c*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+-------------+------+------+
| qpos | tokenized | normalized  | docs | hits |
+------+-----------+-------------+------+------+
| 1    | c*        | cat         | 1    | 2    |
| 1    | c*        | carnivorous | 1    | 1    |
| 1    | c*        | catus       | 1    | 1    |
+------+-----------+-------------+------+------+

MySQL [(none)]> CALL KEYWORDS('ca*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+-------------+------+------+
| qpos | tokenized | normalized  | docs | hits |
+------+-----------+-------------+------+------+
| 1    | ca*       | cat         | 1    | 2    |
| 1    | ca*       | carnivorous | 1    | 1    |
| 1    | ca*       | catus       | 1    | 1    |
+------+-----------+-------------+------+------+

MySQL [(none)]> CALL KEYWORDS('cat*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | cat*      | cat        | 1    | 2    |
| 1    | cat*      | catus      | 1    | 1    |
+------+-----------+------------+------+------+

There is a nice trick how you can improve the above algorithm - use bigram_index. When you have it enabled for the table what you get in it is not just a single word, but each pair of words standing one after another indexed as a separate token.

This allows to predict not just the current word's ending, but the next word too which is especially beneficial for the purpose of autocomplete.

Examples
MySQL [(none)]> CALL KEYWORDS('m*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | m*        | my         | 1    | 2    |
| 1    | m*        | mammal     | 1    | 1    |
| 1    | m*        | my cat     | 1    | 1    |
| 1    | m*        | my dog     | 1    | 1    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('my*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | my*       | my         | 1    | 2    |
| 1    | my*       | my cat     | 1    | 1    |
| 1    | my*       | my dog     | 1    | 1    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('c*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+--------------------+------+------+
| qpos | tokenized | normalized         | docs | hits |
+------+-----------+--------------------+------+------+
| 1    | c*        | cat                | 1    | 2    |
| 1    | c*        | carnivorous        | 1    | 1    |
| 1    | c*        | carnivorous mammal | 1    | 1    |
| 1    | c*        | cat felis          | 1    | 1    |
| 1    | c*        | cat loves          | 1    | 1    |
| 1    | c*        | catus              | 1    | 1    |
| 1    | c*        | catus is           | 1    | 1    |
+------+-----------+--------------------+------+------+

MySQL [(none)]> CALL KEYWORDS('ca*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+--------------------+------+------+
| qpos | tokenized | normalized         | docs | hits |
+------+-----------+--------------------+------+------+
| 1    | ca*       | cat                | 1    | 2    |
| 1    | ca*       | carnivorous        | 1    | 1    |
| 1    | ca*       | carnivorous mammal | 1    | 1    |
| 1    | ca*       | cat felis          | 1    | 1    |
| 1    | ca*       | cat loves          | 1    | 1    |
| 1    | ca*       | catus              | 1    | 1    |
| 1    | ca*       | catus is           | 1    | 1    |
+------+-----------+--------------------+------+------+

MySQL [(none)]> CALL KEYWORDS('cat*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | cat*      | cat        | 1    | 2    |
| 1    | cat*      | cat felis  | 1    | 1    |
| 1    | cat*      | cat loves  | 1    | 1    |
| 1    | cat*      | catus      | 1    | 1    |
| 1    | cat*      | catus is   | 1    | 1    |
+------+-----------+------------+------+------+

CALL KEYWORDS supports distributed tables so no matter how big your data set you can benefit from using it.

¶ 13.18

Spell correction

Spell correction, also known as:

and so on, is a software functionality that suggests alternatives to or makes automatic corrections of the text you have typed in. The concept of correcting typed text dates back to the 1960s when computer scientist Warren Teitelman, who also invented the "undo" command, introduced a philosophy of computing called D.W.I.M., or "Do What I Mean." Instead of programming computers to accept only perfectly formatted instructions, Teitelman argued that they should be programmed to recognize obvious mistakes.

The first well-known product to provide spell correction functionality was Microsoft Word 6.0, released in 1993.

How it works

There are a few ways spell correction can be done, but it's important to note that there is no purely programmatic way to convert your mistyped "ipone" into "iphone" with decent quality. Mostly, there has to be a dataset the system is based on. The dataset can be:

Manticore provides the commands CALL QSUGGEST and CALL SUGGEST that can be used for automatic spell correction purposes.

CALL QSUGGEST, CALL SUGGEST

Both commands are available via SQL only, and the general syntax is:

CALL QSUGGEST(word, table [,options])
CALL SUGGEST(word, table [,options])

options: N as option_name[, M as another_option, ...]

These commands provide all suggestions from the dictionary for a given word. They work only on tables with infixing enabled and dict=keywords. They return the suggested keywords, Levenshtein distance between the suggested and original keywords, and the document statistics of the suggested keyword.

If the first parameter contains multiple words, then:

That's the only difference between them. Several options are supported for customization:

Option Description Default
limit Returns N top matches 5
max_edits Keeps only dictionary words with a Levenshtein distance less than or equal to N 4
result_stats Provides Levenshtein distance and document count of the found words 1 (enabled)
delta_len Keeps only dictionary words with a length difference less than N 3
max_matches Number of matches to keep 25
reject Rejected words are matches that are not better than those already in the match queue. They are put in a rejected queue that gets reset in case one actually can go in the match queue. This parameter defines the size of the rejected queue (as reject*max(max_matched,limit)). If the rejected queue is filled, the engine stops looking for potential matches 4
result_line alternate mode to display the data by returning all suggests, distances and docs each per one row 0
non_char do not skip dictionary words with non alphabet symbols 0 (skip such words)
sentence Returns the original sentence along with the last word replaced by the matched one. 0 (do not return the full sentence)

To show how it works, let's create a table and add a few documents to it.

create table products(title text) min_infix_len='2';
insert into products values (0,'Crossbody Bag with Tassel'), (0,'microfiber sheet set'), (0,'Pet Hair Remover Glove');

As you can see, the mistyped word "crossbUdy" gets corrected to "crossbody". By default, CALL SUGGEST/QSUGGEST return:

To disable the display of these statistics, you can use the option 0 as result_stats.

Example
call suggest('crossbudy', 'products');
+-----------+----------+------+
| suggest   | distance | docs |
+-----------+----------+------+
| crossbody | 1        | 1    |
+-----------+----------+------+

If the first parameter is not a single word, but multiple, then CALL SUGGEST will return suggestions only for the first word.

Example
call suggest('bagg with tasel', 'products');
+---------+----------+------+
| suggest | distance | docs |
+---------+----------+------+
| bag     | 1        | 1    |
+---------+----------+------+

If the first parameter is not a single word, but multiple, then CALL SUGGEST will return suggestions only for the last word.

Example
CALL QSUGGEST('bagg with tasel', 'products');
+---------+----------+------+
| suggest | distance | docs |
+---------+----------+------+
| tassel  | 1        | 1    |
+---------+----------+------+

Adding 1 as sentence makes CALL QSUGGEST return the entire sentence with the last word corrected.

Example
CALL QSUGGEST('bag with tasel', 'products', 1 as sentence);
+-------------------+----------+------+
| suggest           | distance | docs |
+-------------------+----------+------+
| bag with tassel   | 1        | 1    |
+-------------------+----------+------+
Different display mode

The 1 as result_line option changes the way the suggestions are displayed in the output. Instead of showing each suggestion in a separate row, it displays all suggestions, distances, and docs in a single row. Here's an example to demonstrate this:

Example:
CALL QSUGGEST('bagg with tasel', 'products', 1 as result_line);
+----------+--------+
| name     | value  |
+----------+--------+
| suggests | tassel |
| distance | 1      |
| docs     | 1      |
+----------+--------+

Interactive course

This interactive course demonstrates online how the spell correction feature works on a web page and experiment with different examples.

Typical flow with Manticore and a database

¶ 13.19

Query cache

Query cache stores compressed result sets in memory and reuses them for subsequent queries when possible. You can configure it using the following directives:

These settings can be changed on the fly using the SET GLOBAL statement:

mysql> SET GLOBAL qcache_max_bytes=128000000;

These changes are applied immediately, and cached result sets that no longer satisfy the constraints are immediately discarded. When reducing the cache size on the fly, MRU (most recently used) result sets win.

Query cache operates as follows. When enabled, every full-text search result is completely stored in memory. This occurs after full-text matching, filtering, and ranking, so essentially we store total_found {docid,weight} pairs. Compressed matches can consume anywhere from 2 bytes to 12 bytes per match on average, mostly depending on the deltas between subsequent docids. Once the query is complete, we check the wall time and size thresholds, and either save the compressed result set for reuse or discard it.

Note that the query cache's impact on RAM is not limited byqcache_max_bytes! If you run, for example, 10 concurrent queries, each matching up to 1M matches (after filters), then the peak temporary RAM usage will be in the range of 40 MB to 240 MB, even if the queries are fast enough and don't get cached.

Queries can use cache when the table, full-text query (i.e.,MATCH() contents), and ranker all match, and filters are compatible. This means:

Cache entries expire with TTL and also get invalidated on table rotation, or on TRUNCATE, or on ATTACH. Note that currently, entries are not invalidated on arbitrary RT table writes! So a cached query might return older results for the duration of its TTL.

You can inspect the current cache status with SHOW STATUS through the qcache_XXX variables:

mysql> SHOW STATUS LIKE 'qcache%';
+-----------------------+----------+
| Counter               | Value    |
+-----------------------+----------+
| qcache_max_bytes      | 16777216 |
| qcache_thresh_msec    | 3000     |
| qcache_ttl_sec        | 60       |
| qcache_cached_queries | 0        |
| qcache_used_bytes     | 0        |
| qcache_hits           | 0        |
+-----------------------+----------+
6 rows in set (0.00 sec)
¶ 13.20

Collations

Collations primarily impact string attribute comparisons. They define both the character set encoding and the strategy Manticore employs for comparing strings when performing ORDER BY or GROUP BY with a string attribute involved.

String attributes are stored as-is during indexing, and no character set or language information is attached to them. This is fine as long as Manticore only needs to store and return the strings to the calling application verbatim. However, when you ask Manticore to sort by a string value, the request immediately becomes ambiguous.

First, single-byte (ASCII, ISO-8859-1, or Windows-1251) strings need to be processed differently than UTF-8 strings, which may encode each character with a variable number of bytes. Thus, we need to know the character set type to properly interpret the raw bytes as meaningful characters.

Second, we also need to know the language-specific string sorting rules. For example, when sorting according to US rules in the en_US locale, the accented character ï (small letter i with diaeresis) should be placed somewhere after z. However, when sorting with French rules and the fr_FR locale in mind, it should be placed between i and j. Some other set of rules might choose to ignore accents altogether, allowing ï and i to be mixed arbitrarily.

Third, in some cases, we may require case-sensitive sorting, while in others, case-insensitive sorting is needed.

Collations encapsulate all of the following: the character set, the language rules, and the case sensitivity. Manticore currently provides four collations:

  1. libc_ci
  2. libc_cs
  3. utf8_general_ci
  4. binary

The first two collations rely on several standard C library (libc) calls and can thus support any locale installed on your system. They provide case-insensitive (_ci) and case-sensitive (_cs) comparisons, respectively. By default, they use the C locale, effectively resorting to bytewise comparisons. To change that, you need to specify a different available locale using the collation_libc_locale directive. The list of locales available on your system can usually be obtained with the locale command:

$ locale -a
C
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
es_ES
fr_FR
POSIX
ru_RU.utf8
ru_UA.utf8

The specific list of system locales may vary. Consult your OS documentation to install additional needed locales.

utf8_general_ci and binary locales are built-in into Manticore. The first one is a generic collation for UTF-8 data (without any so-called language tailoring); it should behave similarly to the utf8_general_ci collation in MySQL. The second one is a simple bytewise comparison.

Collation can be overridden via SQL on a per-session basis using the SET collation_connection statement. All subsequent SQL queries will use this collation. Otherwise, all queries will use the server default collation or as specified in the collation_server configuration directive. Manticore currently defaults to the libc_ci collation.

Collations affect all string attribute comparisons, including those within ORDER BY and GROUP BY, so differently ordered or grouped results can be returned depending on the collation chosen. Note that collations don't affect full-text searching; for that, use the charset_table.

¶ 13.21

Cost-based optimizer

When Manticore executes a fullscan query, it can either use a plain scan to check every document against the filters or employ additional data and/or algorithms to speed up query execution. Manticore uses a cost-based optimizer (CBO), also known as a "query optimizer" to determine which approach to take.

The CBO can also enhance the performance of full-text queries. See below for more details.

The CBO may decide to replace one or more query filters with one of the following entities if it determines that doing so will improve performance:

  1. A docid index utilizes a special docid-only secondary index stored in files with the .spt extension. Besides improving filters on document IDs, the docid index is also used to accelerate document ID to row ID lookups and to speed up the application of large killlists during daemon startup.
  2. A columnar scan relies on columnar storage and can only be used on a columnar attribute. It scans every value and tests it against the filter, but it is heavily optimized and is typically faster than the default approach.
  3. Secondary indexes are generated for all attributes by default. They use the PGM index along with Manticore's built-in inverted index to retrieve the list of row IDs corresponding to a value or range of values. Secondary indexes are stored in files with the .spidx extension.

The optimizer estimates the cost of each execution path using various attribute statistics, including:

  1. Information on the data distribution within an attribute (histograms, stored in .sphi files). Histograms are generated automatically when data is indexed and serve as the primary source of information for the CBO.
  2. Information from PGM (secondary indexes), which helps estimate the number of document lists to read. This assists in gauging doclist merge performance and in selecting the appropriate merge algorithm (priority queue merge or bitmap merge).
  3. Columnar encoding statistics, employed to estimate columnar data decompression performance.
  4. A columnar min-max tree. While the CBO uses histograms to estimate the number of documents left after applying the filter, it also needs to determine how many documents the filter had to process. For columnar attributes, partial evaluation of the min-max tree serves this purpose.
  5. Full-text dictionary. The CBO utilizes term stats to estimate the cost of evaluating the full-text tree.

The optimizer computes the execution cost for every filter used in a query. Since certain filters can be replaced with several different entities (e.g., for a document id, Manticore can use a plain scan, a docid index lookup, a columnar scan (if the document id is columnar), and a secondary index), the optimizer evaluates all available combinations. However, there is a maximum limit of 1024 combinations.

To estimate query execution costs, the optimizer calculates the estimated costs of the most significant operations performed when executing the query. It uses preset constants to represent the cost of each operation.

The optimizer compares the costs of each execution path and chooses the path with the lowest cost to execute the query.

When working with full-text queries that have filters by attributes, the query optimizer decides between two possible execution paths. One is to execute the full-text query, retrieve the matches, and use filters. The other is to replace filters with one or more entities described above, fetch rowids from them, and inject them into the full-text matching tree. This way, full-text search results will intersect with full-scan results. The query optimizer estimates the cost of full-text tree evaluation and the best possible path for computing filter results. Using this information, the optimizer chooses the execution path.

Another factor to consider is multithreaded query execution (when pseudo_sharding is enabled). The CBO is aware that some queries can be executed in multiple threads and takes this into account. The CBO prioritizes shorter query execution times (i.e., latency) over throughput. For instance, if a query using a columnar scan can be executed in multiple threads (and occupy multiple CPU cores) and is faster than a query executed in a single thread using secondary indexes, multithreaded execution will be preferred.

Queries using secondary indexes and docid indexes always run in a single thread, as benchmarks indicate that there is little to no benefit in making them multithreaded.

At present, the optimizer only uses CPU costs and does not take memory or disk usage into account.

¶ 14

Updating table schema

Updating table schema in RT mode

ALTER TABLE table ADD COLUMN column_name [{INTEGER|INT|BIGINT|FLOAT|BOOL|MULTI|MULTI64|JSON|STRING|TIMESTAMP|TEXT [INDEXED [ATTRIBUTE]]}] [engine='columnar']

ALTER TABLE table DROP COLUMN column_name

ALTER TABLE table MODIFY COLUMN column_name bigint

This feature only supports adding one field at a time for RT tables or the expansion of an int column to bigint. The supported data types are:

Important notes:

Example
mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+

mysql> alter table rt add column test integer;

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+

mysql> alter table rt drop column group_id;

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+

mysql> alter table rt add column title text indexed;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+

mysql> alter table rt add column title text attribute;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
| title      | string    |            |
+------------+-----------+------------+

mysql> alter table rt drop column title;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+
mysql> alter table rt drop column title;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+

Updating table FT settings in RT mode

ALTER TABLE table ft_setting='value'[, ft_setting2='value']

You can use ALTER to modify the full-text settings of your table in RT mode. However, it only affects new documents and not existing ones.
Example:

Example
mysql> create table rt(title text) charset_table='a,b,c';

mysql> insert into rt(title) values('abcd');

mysql> select * from rt where match('abcd');
+---------------------+-------+
| id                  | title |
+---------------------+-------+
| 1514630637682688054 | abcd  |
+---------------------+-------+

mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 1     |
| total_found   | 1     |
| time          | 0.000 |
| keyword[0]    | abc   |
| docs[0]       | 1     |
| hits[0]       | 1     |
+---------------+-------+

mysql> alter table rt charset_table='a,b,c,d';
mysql> select * from rt where match('abcd');
+---------------------+-------+
| id                  | title |
+---------------------+-------+
| 1514630637682688054 | abcd  |
+---------------------+-------+

mysql> show meta
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 1     |
| total_found   | 1     |
| time          | 0.000 |
| keyword[0]    | abc   |
| docs[0]       | 1     |
| hits[0]       | 1     |
+---------------+-------+

mysql> insert into rt(title) values('abcd');
mysql> select * from rt where match('abcd');
+---------------------+-------+
| id                  | title |
+---------------------+-------+
| 1514630637682688055 | abcd  |
| 1514630637682688054 | abcd  |
+---------------------+-------+

mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 2     |
| total_found   | 2     |
| time          | 0.000 |
| keyword[0]    | abc   |
| docs[0]       | 1     |
| hits[0]       | 1     |
| keyword[1]    | abcd  |
| docs[1]       | 1     |
| hits[1]       | 1     |
+---------------+-------+

Updating table FT settings in plain mode

ALTER TABLE table RECONFIGURE

ALTER can also reconfigure an RT table in the plain mode, so that new tokenization, morphology and other text processing settings from the configuration file take effect for new documents. Note, that the existing document will be left intact. Internally, it forcibly saves the current RAM chunk as a new disk chunk and adjusts the table header, so that new documents are tokenized using the updated full-text settings.

Example
mysql> show table rt settings;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| settings      |       |
+---------------+-------+
1 row in set (0.00 sec)

mysql> alter table rt reconfigure;
Query OK, 0 rows affected (0.00 sec)

mysql> show table rt settings;
+---------------+----------------------+
| Variable_name | Value                |
+---------------+----------------------+
| settings      | morphology = stem_en |
+---------------+----------------------+
1 row in set (0.00 sec)

Rebuild secondary index

ALTER TABLE table REBUILD SECONDARY

You can also use ALTER to rebuild secondary indexes in a given table. Sometimes, a secondary index can be disabled for the entire table or for one or multiple attributes within the table:

ALTER TABLE table REBUILD SECONDARY rebuilds secondary indexes from attribute data and enables them again.

Additionally, an old version of secondary indexes may be supported but will lack certain features. REBUILD SECONDARY can be used to update secondary indexes.

Example
ALTER TABLE rt REBUILD SECONDARY;
Query OK, 0 rows affected (0.00 sec)
¶ 15

Functions

¶ 15.1

Mathematical functions

ABS()

Returns the absolute value of the argument.

ATAN2()

Returns the arctangent function of two arguments, expressed in radians.

BITDOT()

BITDOT(mask, w0, w1, ...) returns the sum of products of each bit of a mask multiplied by its weight. bit0*w0 + bit1*w1 + ...

CEIL()

Returns the smallest integer value greater than or equal to the argument.

COS()

Returns the cosine of the argument.

CRC32()

Returns the CRC32 value of a string argument.

EXP()

Returns the exponent of the argument (e=2.718... to the power of the argument).

FIBONACCI()

Returns the N-th Fibonacci number, where N is the integer argument. That is, arguments of 0 and up will generate the values 0, 1, 1, 2, 3, 5, 8, 13 and so on. Note that the computations are done using 32-bit integer math and thus numbers 48th and up will be returned modulo 2^32.

FLOOR()

Returns the largest integer value lesser than or equal to the argument.

GREATEST()

GREATEST(attr_json.some_array) function takes a JSON array as the argument, and returns the greatest value in that array. Also works for MVA.

IDIV()

Returns the result of an integer division of the first argument by the second argument. Both arguments must be of an integer type.

LEAST()

LEAST(attr_json.some_array) function takes a JSON array as the argument, and returns the least value in that array. Also works for MVA.

LN()

Returns the natural logarithm of the argument (with the base of e=2.718...).

LOG10()

Returns the common logarithm of the argument (with the base of 10).

LOG2()

Returns the binary logarithm of the argument (with the base of 2).

MAX()

Returns the larger of two arguments.

MIN()

Returns the smaller of two arguments.

POW()

Returns the first argument raised to the power of the second argument.

RAND()

Returns a random float between 0 and 1. It can optionally accept a seed, which can be a constant integer or an integer attribute's name.

If you use a seed, keep in mind that it resets rand()'s starting point separately for each plain table, RT disk, RAM chunk, or pseudo shard. Therefore, queries to a distributed table in any form can return multiple identical random values.

SIN()

Returns the sine of the argument.

SQRT()

Returns the square root of the argument.

¶ 15.2

Searching and ranking functions

BM25A()

BM25A(k1,b) returns the exact BM25A() value. Requires the expr ranker and enabled index_field_lengths. Parameters k1 and b must be floats.

BM25F()

BM25F(k1, b, {field=weight, ...}) returns the exact BM25F() value and requires index_field_lengths to be enabled. The expr ranker is also necessary. Parameters k1 and b must be floats.

EXIST()

Substitutes non-existent columns with default values. It returns either the value of an attribute specified by 'attr-name', or the 'default-value' if that attribute does not exist. STRING or MVA attributes are not supported. This function is useful when searching through multiple tables with different schemas.

SELECT *, EXIST('gid', 6) as cnd FROM i1, i2 WHERE cnd>5

MIN_TOP_SORTVAL()

Returns the sort key value of the worst-ranked element in the current top-N matches if the sort key is a float, and 0 otherwise.

MIN_TOP_WEIGHT()

Returns the weight of the worst-ranked element in the current top-N matches.

PACKEDFACTORS()

PACKEDFACTORS() can be used in queries to display all calculated weighting factors during matching or to provide a binary attribute for creating a custom ranking UDF. This function only works if the expression ranker is specified and the query is not a full scan; otherwise, it returns an error. PACKEDFACTORS() can take an optional argument that disables ATC ranking factor calculation: PACKEDFACTORS({no_atc=1}). Calculating ATC significantly slows down query processing, so this option can be useful if you need to see the ranking factors but don't require ATC. PACKEDFACTORS() can also output in JSON format: PACKEDFACTORS({json=1}). The respective outputs in either key-value pair or JSON format are shown below. (Note that the examples below are wrapped for readability; actual returned values would be single-line.)

mysql> SELECT id, PACKEDFACTORS() FROM test1
    -> WHERE MATCH('test one') OPTION ranker=expr('1') \G

*************************** 1\. row ***************************
             id: 1
packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
    field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
        min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
        min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
        exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
    word0=(tf=1, idf=-0.062982),
    word1=(tf=1, idf=0.215338)
1 row in set (0.00 sec)
mysql> SELECT id, PACKEDFACTORS({json=1}) FROM test1
    -> WHERE MATCH('test one') OPTION ranker=expr('1') \G

*************************** 1\. row ***************************
                     id: 1
packedfactors({json=1}):
{

    "bm25": 569,
    "bm25a": 0.617197,
    "field_mask": 2,
    "doc_word_count": 2,
    "fields": [
        {
            "lcs": 1,
            "hit_count": 2,
            "word_count": 2,
            "tf_idf": 0.152356,
            "min_idf": -0.062982,
            "max_idf": 0.215338,
            "sum_idf": 0.152356,
            "min_hit_pos": 4,
            "min_best_span_pos": 4,
            "exact_hit": 0,
            "max_window_hits": 1,
            "min_gaps": 2,
            "exact_order": 1,
            "lccs": 1,
            "wlccs": 0.215338,
            "atc": -0.003974
        }
    ],
    "words": [
        {
            "tf": 1,
            "idf": -0.062982
        },
        {
            "tf": 1,
            "idf": 0.215338
        }
    ]

}
1 row in set (0.01 sec)

This function can be used to implement custom ranking functions in UDFs, as in:

SELECT *, CUSTOM_RANK(PACKEDFACTORS()) AS r
FROM my_index
WHERE match('hello')
ORDER BY r DESC
OPTION ranker=expr('1');

Where CUSTOM_RANK() is a function implemented in a UDF. It should declare a SPH_UDF_FACTORS structure (defined in sphinxudf.h), initialize this structure, unpack the factors into it before usage, and deinitialize it afterwards, as follows:

SPH_UDF_FACTORS factors;
sphinx_factors_init(&factors);
sphinx_factors_unpack((DWORD*)args->arg_values[0], &factors);
// ... can use the contents of factors variable here ...
sphinx_factors_deinit(&factors);

PACKEDFACTORS() data is available at all query stages, not just during the initial matching and ranking pass. This enables another particularly interesting application of PACKEDFACTORS(): re-ranking.

In the example above, we used an expression-based ranker with a dummy expression and sorted the result set by the value computed by our UDF. In other words, we used the UDF to rank all our results. Now, let's assume for the sake of an example that our UDF is extremely expensive to compute, with a throughput of only 10,000 calls per second. If our query matches 1,000,000 documents, we would want to use a much simpler expression to do most of our ranking in order to maintain reasonable performance. Then, we would apply the expensive UDF to only a few top results, say, the top 100 results. In other words, we would build the top 100 results using a simpler ranking function and then re-rank those with a more complex one. This can be done with subselects:

SELECT * FROM (
    SELECT *, CUSTOM_RANK(PACKEDFACTORS()) AS r
    FROM my_index WHERE match('hello')
    OPTION ranker=expr('sum(lcs)*1000+bm25')
    ORDER BY WEIGHT() DESC
    LIMIT 100
) ORDER BY r DESC LIMIT 10

In this example, the expression-based ranker is called for every matched document to compute WEIGHT(), so it gets called 1,000,000 times. However, the UDF computation can be postponed until the outer sort, and it will only be performed for the top 100 matches by WEIGHT(), according to the inner limit. This means the UDF will only be called 100 times. Finally, the top 10 matches by UDF value are selected and returned to the application.

For reference, in a distributed setup, the PACKEDFACTORS() data is sent from the agents to the master node in binary format. This makes it technically feasible to implement additional re-ranking passes on the master node if needed.

When used in SQL but not called from any UDFs, the result of PACKEDFACTORS() is formatted as plain text, which can be used to manually assess the ranking factors. Note that this feature is not currently supported by the Manticore API.

REMOVE_REPEATS()

REMOVE_REPEATS ( result_set, column, offset, limit ) - removes repeated adjusted rows with the same 'column' value.

SELECT REMOVE_REPEATS((SELECT * FROM dist1), gid, 0, 10)

WEIGHT()

The WEIGHT() function returns the calculated matching score. If no ordering is specified, the result is sorted in descending order by the score provided by WEIGHT(). In this example, we order first by weight and then by an integer attribute.

The search above performs a simple matching, where all words need to be present. However, we can do more (and this is just a simple example):

mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('"list of business laptops"/3');
+------+------+-------------------------------------+---------------------------+----------+
| id   | gid  | title                               | content                   | weight() |
+------+------+-------------------------------------+---------------------------+----------+
|    1 |   10 | List of HP business laptops         | Elitebook Probook         |     2397 |
|    2 |   10 | List of Dell business laptops       | Latitude Precision Vostro |     2397 |
|    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware       |     2375 |
|    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook          |     2375 |
+------+------+-------------------------------------+---------------------------+----------+
4 rows in set (0.00 sec)


mysql> SHOW META;
+----------------+----------+
| Variable_name  | Value    |
+----------------+----------+
| total          | 4        |
| total_found    | 4        |
| total_relation | eq       |
| time           | 0.000    |
| keyword[0]     | list     |
| docs[0]        | 5        |
| hits[0]        | 5        |
| keyword[1]     | of       |
| docs[1]        | 4        |
| hits[1]        | 4        |
| keyword[2]     | business |
| docs[2]        | 2        |
| hits[2]        | 2        |
| keyword[3]     | laptops  |
| docs[3]        | 5        |
| hits[3]        | 5        |
+----------------+----------+
16 rows in set (0.00 sec)

Here, we search for four words, but a match can occur even if only three of the four words are found. The search will rank documents containing all words higher.

ZONESPANLIST()

The ZONESPANLIST() function returns pairs of matched zone spans. Each pair contains the matched zone span identifier, a colon, and the order number of the matched zone span. For example, if a document reads <emphasis role="bold"><i>text</i> the <i>text</i></emphasis>, and you query for 'ZONESPAN:(i,b) text', then ZONESPANLIST() will return the string "1:1 1:2 2:1", meaning that the first zone span matched "text" in spans 1 and 2, and the second zone span in span 1 only.

QUERY()

QUERY() returns the current search query. QUERY() is a postlimit expression and is intended to be used with SNIPPET().

Table functions are a mechanism for post-query result set processing. Table functions take an arbitrary result set as input and return a new, processed set as output. The first argument should be the input result set, but a table function can optionally take and handle more arguments. Table functions can completely change the result set, including the schema. Currently, only built-in table functions are supported. Table functions work for both outer SELECT and nested SELECT.

¶ 15.3

Type Casting Functions

Type casting comprises three principal actions: conversion, reinterpretation, and promotion.

BIGINT()

This function promotes an integer argument to a 64-bit type, leaving floating-point arguments untouched. It's designed to ensure the evaluation of specific expressions (such as a*b) in 64-bit mode, even if all arguments are 32-bit.

DOUBLE()

The DOUBLE() function promotes its argument to a floating-point type. This is designed to help enforce the evaluation of numeric JSON fields.

INTEGER()

The INTEGER() function promotes its argument to a 64-bit signed type. This is designed to enforce the evaluation of numeric JSON fields.

TO_STRING()

This function forcefully converts its argument to a string type.

UINT()

The UINT() function promotes its argument to a 32-bit unsigned integer type.

UINT64()

The UINT64() function promotes its argument to a 64-bit unsigned integer type.

SINT()

The SINT() function forcefully reinterprets its 32-bit unsigned integer argument as signed and extends it to a 64-bit type (since the 32-bit type is unsigned). For instance, 1-2 ordinarily evaluates to 4294967295, but SINT(1-2) evaluates to -1.

¶ 15.4

Arrays and conditions functions

ALL()

ALL(cond FOR var IN json.array) applies to JSON arrays and returns 1 if the condition is true for all elements in the array and 0 otherwise. cond is a general expression that can also use var as the current value of an array element within itself.

ALL() with json
select *, ALL(x>0 AND x<4 FOR x IN j.ar) from tbl
+------+--------------+--------------------------------+
| id   | j            | all(x>0 and x<4 for x in j.ar) |
+------+--------------+--------------------------------+
|    1 | {"ar":[1,3]} |                              1 |
|    2 | {"ar":[3,7]} |                              0 |
+------+--------------+--------------------------------+
2 rows in set (0.00 sec)
ALL() with json ex. 2
select *, ALL(x>0 AND x<4 FOR x IN j.ar) cond from tbl where cond=1
+------+--------------+------+
| id   | j            | cond |
+------+--------------+------+
|    1 | {"ar":[1,3]} |    1 |
+------+--------------+------+
1 row in set (0.00 sec)

ALL(mva) is a special constructor for multi-value attributes. When used with comparison operators (including comparison with IN()), it returns 1 if all values from the MVA attribute are found among the compared values.

ALL() with MVA
select * from tbl where all(m) >= 1
+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)
ALL() with MVA and IN()
select * from tbl where all(m) in (1, 3, 7, 10)
+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

To compare an MVA attribute with an array, avoid using <mva> NOT ALL(); use ALL(<mva>) NOT IN() instead.

ALL() with MVA and NOT IN()
select * from tbl where all(m) not in (2, 4)
+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

ALL(string list) is a special operation for filtering string tags.

If all of the words enumerated as arguments of ALL() are present in the attribute, the filter matches. The optional NOT inverts the logic.

This filter internally uses doc-by-doc matching, so in the case of a full scan query, it might be slower than expected. It is intended for attributes that are not indexed, like calculated expressions or tags in PQ tables. If you need such filtering, consider the solution of putting the string attribute as a full-text field, and then use the full-text operator match(), which will invoke a full-text search.

ALL() with strings
select * from tbl where tags all('bug', 'release')
+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)
ALL() with strings and NOT
mysql> select * from tbl

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

mysql> select * from tbl where tags not all('bug')

Empty set (0.00 sec)

ANY()

ANY(cond FOR var IN json.array) applies to JSON arrays and returns 1 if the condition is true for any element in the array and 0 otherwise. cond is a general expression that can also use var as the current value of an array element within itself.

ANY() with json
select *, ANY(x>5 AND x<10 FOR x IN j.ar) from tbl
+------+--------------+---------------------------------+
| id   | j            | any(x>5 and x<10 for x in j.ar) |
+------+--------------+---------------------------------+
|    1 | {"ar":[1,3]} |                               0 |
|    2 | {"ar":[3,7]} |                               1 |
+------+--------------+---------------------------------+
2 rows in set (0.00 sec)
ANY() with json ex. 2
select *, ANY(x>5 AND x<10 FOR x IN j.ar) cond from tbl where cond=1
+------+--------------+------+
| id   | j            | cond |
+------+--------------+------+
|    2 | {"ar":[3,7]} |    1 |
+------+--------------+------+
1 row in set (0.00 sec)

ANY(mva) is a special constructor for multi-value attributes. When used with comparison operators (including comparison with IN()), it returns 1 if any of the MVA values is found among the compared values.

When comparing an array using IN(), ANY() is assumed by default if not otherwise specified, but a warning will be issued regarding the missing constructor.

ANY() with MVA
mysql> select * from tbl

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.01 sec)

mysql> select * from tbl where any(m) > 5

+------+------+
| id   | m    |
+------+------+
|    2 | 3,7  |
+------+------+
1 row in set (0.00 sec)
ANY() with MVA and IN()
select * from tbl where any(m) in (1, 7, 10)
+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

To compare an MVA attribute with an array, avoid using <mva> NOT ANY(); use <mva> NOT IN() instead or ANY(<mva>) NOT IN().

ANY() with MVA and NOT IN()
mysql> select * from tbl

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from tbl where any(m) not in (1, 3, 5)

+------+------+
| id   | m    |
+------+------+
|    2 | 3,7  |
+------+------+
1 row in set (0.00 sec)

ANY(string list) is a special operation for filtering string tags.

If any of the words enumerated as arguments of ANY() is present in the attribute, the filter matches. The optional NOT inverts the logic.

This filter internally uses doc-by-doc matching, so in the case of a full scan query, it might be slower than expected. It is intended for attributes that are not indexed, like calculated expressions or tags in PQ tables. If you need such filtering, consider the solution of putting the string attribute as a full-text field, and then use the full-text operator match(), which will invoke a full-text search.

ANY() with strings
select * from tbl where tags any('bug', 'feature')
+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)
ANY() with strings and NOT
select * from tbl
--------------

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

--------------
select * from tbl where tags not any('feature', 'priority_low')
--------------

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
+------+---------------------------+
1 row in set (0.01 sec)

CONTAINS()

CONTAINS(polygon, x, y) checks whether the (x,y) point is within the given polygon, and returns 1 if true, or 0 if false. The polygon has to be specified using either the POLY2D() function. The former function is intended for "small" polygons, meaning less than 500 km (300 miles) a side, and it doesn't take into account the Earth's curvature for speed. For larger distances, you should use GEOPOLY2D, which tessellates the given polygon in smaller parts, accounting for the Earth's curvature.

IF()

The behavior of IF() is slightly different from its MySQL counterpart. It takes 3 arguments, checks whether the 1st argument is equal to 0.0, returns the 2nd argument if it is not zero, or the 3rd one when it is. Note that unlike comparison operators, IF() does not use a threshold! Therefore, it's safe to use comparison results as its 1st argument, but arithmetic operators might produce unexpected results. For instance, the following two calls will produce different results even though they are logically equivalent:

IF()
IF ( sqrt(3)*sqrt(3)-3<>0, a, b )
IF ( sqrt(3)*sqrt(3)-3, a, b )

In the first case, the comparison operator <> will return 0.0 (false) due to a threshold, and IF() will always return ** as a result. In the second case, the same sqrt(3)*sqrt(3)-3 expression will be compared with zero without a threshold by the IF() function itself. However, its value will be slightly different from zero due to limited floating-point calculation precision. Because of this, the comparison with 0.0 done by IF() will not pass, and the second variant will return 'a' as a result.

HISTOGRAM()

HISTOGRAM(expr, {hist_interval=size, hist_offset=value}) takes a bucket size and returns the bucket number for the value. The key function is:

key_of_the_bucket = interval + offset * floor ( ( value - offset ) / interval )

The histogram argument interval must be positive. The histogram argument offset must be positive and less than interval. It is used in aggregation, FACET, and grouping.

Example:

HISTOGRAM()
SELECT COUNT(*),
HISTOGRAM(price, {hist_interval=100}) as price_range
FROM facets
GROUP BY price_range ORDER BY price_range ASC;

IN()

IN(expr,val1,val2,...) takes 2 or more arguments and returns 1 if the 1st argument (expr) is equal to any of the other arguments (val1..valN), or 0 otherwise. Currently, all the checked values (but not the expression itself) are required to be constant. The constants are pre-sorted, and binary search is used, so IN() even against a large arbitrary list of constants will be very quick. The first argument can also be an MVA attribute. In that case, IN() will return 1 if any of the MVA values are equal to any of the other arguments. IN() also supports IN(expr,@uservar) syntax to check whether the value belongs to the list in the given global user variable. The first argument can be a JSON attribute.

INDEXOF()

INDEXOF(cond FOR var IN json.array) function iterates through all elements in the array and returns the index of the first element for which 'cond' is true, and -1 if 'cond' is false for every element in the array.

INTERVAL()

INTERVAL(expr,point1,point2,point3,...) takes 2 or more arguments and returns the index of the argument that is less than the first argument: it returns 0 if expr<point1, 1 if point1<=expr<point2, and so on. It is required that point1<point2<...<pointN for this function to work correctly.

LENGTH()

LENGTH(attr_mva) function returns the number of elements in an MVA set. It works with both 32-bit and 64-bit MVA attributes. LENGTH(attr_json) returns the length of a field in JSON. The return value depends on the type of field. For example, LENGTH(json_attr.some_int) always returns 1, and LENGTH(json_attr.some_array) returns the number of elements in the array. LENGTH(string_expr) function returns the length of the string resulting from an expression.
TO_STRING() must enclose the expression, regardless of whether the expression returns a non-string or it's simply a string attribute.

RANGE()

RANGE(expr, {range_from=value,range_to=value}) takes a set of ranges and returns the bucket number for the value.
This expression includes the range_from value and excludes the range_to value for each range. A range can be open - having only the range_from or only the range_to value. It is used in aggregation, FACET, and grouping.

Example:

RANGE()
SELECT COUNT(*),
RANGE(price, {range_to=150},{range_from=150,range_to=300},{range_from=300}) price_range
FROM facets
GROUP BY price_range ORDER BY price_range ASC;

REMAP()

REMAP(condition, expression, (cond1, cond2, ...), (expr1, expr2, ...)) function allows you to make some exceptions to expression values depending on condition values. The condition expression should always result in an integer, while the expression can result in an integer or float.

Example:

REMAP()
SELECT id, size, REMAP(size, 15, (5,6,7,8), (1,1,2,2)) s
FROM products
ORDER BY s ASC;
Another example
SELECT REMAP(userid, karmapoints, (1, 67), (999, 0)) FROM users;
SELECT REMAP(id%10, salary, (0), (0.0)) FROM employes;

This will put documents with sizes 5 and 6 first, followed by sizes 7 and 8. In case there's an original value not listed in the array (e.g. size 10), it will default to 15, and in this case, will be placed at the end.

¶ 15.5

Date and time functions

Note, that CURTIME(), UTC_TIME(), UTC_TIMESTAMP(), and TIMEDIFF() can be promoted to numeric types using arbitrary conversion functions such as BIGINT(), DOUBLE(), etc.

NOW()

Returns the current timestamp as an INTEGER.

SQL
select NOW();
+------------+
| NOW()      |
+------------+
| 1615788407 |
+------------+

CURTIME()

Returns the current time in the local timezone in hh:ii:ss format.

SQL
select CURTIME();
+-----------+
| CURTIME() |
+-----------+
| 07:06:30  |
+-----------+

CURDATE()

Returns the current date in the local timezone in YYYY-MM-DD format.

SQL
select curdate();
+------------+
| curdate()  |
+------------+
| 2023-08-02 |
+------------+

UTC_TIME()

Returns the current time in UTC timezone in hh:ii:ss format.

SQL
select UTC_TIME();
+------------+
| UTC_TIME() |
+------------+
| 06:06:18   |
+------------+

UTC_TIMESTAMP()

Returns the current time in UTC timezone in YYYY-MM-DD hh:ii:ss format.

SQL
select UTC_TIMESTAMP();
+---------------------+
| UTC_TIMESTAMP()     |
+---------------------+
| 2021-03-15 06:06:03 |
+---------------------+

SECOND()

Returns the integer second (in 0..59 range) from a timestamp argument, according to the current timezone.

SQL
select second(now());
+---------------+
| second(now()) |
+---------------+
| 52            |
+---------------+

MINUTE()

Returns the integer minute (in 0..59 range) from a timestamp argument, according to the current timezone.

SQL
select minute(now());
+---------------+
| minute(now()) |
+---------------+
| 5             |
+---------------+

HOUR()

Returns the integer hour (in 0..23 range) from a timestamp argument, according to the current timezone.

SQL
select hour(now());
+-------------+
| hour(now()) |
+-------------+
| 7           |
+-------------+

DAY()

Returns the integer day of the month (in 1..31 range) from a timestamp argument, according to the current timezone.

SQL
select day(now());
+------------+
| day(now()) |
+------------+
| 15         |
+------------+

MONTH()

Returns the integer month (in 1..12 range) from a timestamp argument, according to the current timezone.

SQL
select month(now());
+--------------+
| month(now()) |
+--------------+
| 3            |
+--------------+

QUARTER()

Returns the integer quarter of the year (in 1..4 range) from a timestamp argument, according to the current timezone.

SQL
select quarter(now());
+----------------+
| quarter(now()) |
+----------------+
| 2              |
+----------------+

YEAR()

Returns the integer year (in 1969..2038 range) from a timestamp argument, according to the current timezone.

SQL
select year(now());
+-------------+
| year(now()) |
+-------------+
| 2024        |
+-------------+

DAYNAME()

Returns the weekday name for a given timestamp argument, according to the current timezone.

SQL
select dayname(now());
+----------------+
| dayname(now()) |
+----------------+
| Wednesday      |
+----------------+

MONTHNAME()

Returns the name of the month for a given timestamp argument, according to the current timezone.

SQL
select monthname(now());
+------------------+
| monthname(now()) |
+------------------+
| August           |
+------------------+

DAYOFWEEK()

Returns the integer weekday index (in 1..7 range) for a given timestamp argument, according to the current timezone.
Note that the week starts on Sunday.

SQL
select dayofweek(now());
+------------------+
| dayofweek(now()) |
+------------------+
| 5                |
+------------------+

DAYOFYEAR()

Returns the integer day of the year (in 1..366 range) for a given timestamp argument, according to the current timezone.

SQL
select dayofyear(now());
+------------------+
| dayofyear(now()) |
+------------------+
|              214 |
+------------------+

YEARWEEK()

Returns the integer year and the day code of the first day of current week (in 1969001..2038366 range) for a given timestamp argument, according to the current timezone.

SQL
select yearweek(now());
+-----------------+
| yearweek(now()) |
+-----------------+
|         2023211 |
+-----------------+

YEARMONTH()

Returns the integer year and month code (in 196912..203801 range) from a timestamp argument, according to the current timezone.

SQL
select yearmonth(now());
+------------------+
| yearmonth(now()) |
+------------------+
| 202103           |
+------------------+

YEARMONTHDAY()

Returns the integer year, month, and date code (ranging from 19691231 to 20380119) based on the current timezone.

SQL
select yearmonthday(now());
+---------------------+
| yearmonthday(now()) |
+---------------------+
| 20210315            |
+---------------------+

TIMEDIFF()

Calculates the difference between two timestamps in the format hh:ii:ss.

SQL
select timediff(1615787586, 1613787583);
+----------------------------------+
| timediff(1615787586, 1613787583) |
+----------------------------------+
| 555:33:23                        |
+----------------------------------+

DATEDIFF()

Calculates the number of days between two given timestamps.

SQL
select datediff(1615787586, 1613787583);
+----------------------------------+
| datediff(1615787586, 1613787583) |
+----------------------------------+
|                               23 |
+----------------------------------+

DATE()

Formats the date part from a timestamp argument as a string in YYYY-MM-DD format.

SQL
select date(now());
+-------------+
| date(now()) |
+-------------+
| 2023-08-02  |
+-------------+

TIME()

Formats the time part from a timestamp argument as a string in HH:MM:SS format.

SQL
select time(now());
+-------------+
| time(now()) |
+-------------+
| 15:21:27    |
+-------------+

DATE_FORMAT()

Returns a formatted string based on the provided date and format arguments. The format argument uses the same specifiers as the strftime function. For convenience, here are some common format specifiers:

Note that this is not a complete list of the specifiers. Please consult the documentation for strftime() for your operating system to get the full list.

SQL
SELECT DATE_FORMAT(NOW(), 'year %Y and time %T');
+------------------------------------------+
| DATE_FORMAT(NOW(), 'year %Y and time %T') |
+------------------------------------------+
| year 2023 and time 11:54:52              |
+------------------------------------------+

This example formats the current date and time, displaying the four-digit year and the time in 24-hour format.

DATE_HISTOGRAM()

DATE_HISTOGRAM(expr, {calendar_interval='unit_name'}) Takes a bucket size as a unit name and returns the bucket number for the value. Values are rounded to the closest bucket. The key function is:

key_of_the_bucket = interval * floor ( value / interval )

Intervals are specified using the unit name, such as week or as a single unit like 1M. Multiple units such as 2w are not supported.

The valid intervals are:

Used in aggregation, FACET, and grouping.

Example:

SELECT COUNT(*),
DATE_HISTOGRAM(tm, {calendar_interval='month'}) AS months
FROM facets
GROUP BY months ORDER BY months ASC;

DATE_RANGE()

DATE_RANGE(expr, {range_from='date_math', range_to='date_math'}) takes a set of ranges and returns the bucket number for the value.
The expression includes the range_from value and excludes the range_to value for each range. The range can be open - having only the range_from or only the range_to value.
The difference between this and the RANGE() function is that the range_from and range_to values can be expressed in Date math expressions.

Used in aggregation, FACET, and grouping.

Example:

SELECT COUNT(*),
DATE_RANGE(tm, {range_to='2017||+2M/M'},{range_from='2017||+2M/M',range_to='2017||+5M/M'},{range_from='2017||+5M/M'}) AS points
FROM idx_dates
GROUP BY points ORDER BY points ASC;

Date math lets you work with dates and times directly in your searches. It's especially useful for handling data that changes over time. With date math, you can easily do things like find entries from a certain period, analyze data trends, or manage when information should be removed. It simplifies working with dates by letting you add or subtract time from a given date, round dates to the nearest time unit, and more, all within your search queries.

To use date math, you start with a base date, which can be:
- now for the current date and time,
- or a specific date string ending with ||.

Then, you can modify this date with operations like:
- +1y to add one year,
- -1h to subtract one hour,
- /m to round to the nearest month.

You can use these units in your operations:
- s for seconds,
- m for minutes,
- h (or H) for hours,
- d for days,
- w for weeks,
- M for months,
- y for years.

Here are some examples of how you might use date math:
- now+4h means four hours from now.
- now-2d/d is the time two days ago, rounded to the nearest day.
- 2010-04-20||+2M/d is June 20, 2010, rounded to the nearest day.

¶ 15.6

Geo spatial functions

GEODIST()

GEODIST(lat1, lon1, lat2, lon2, [...]) function calculates the geosphere distance between two points specified by their coordinates. Note that by default, both latitudes and longitudes must be in radians, and the result will be in meters. You can use arbitrary expressions for any of the four coordinates. An optimized path will be chosen when one pair of arguments directly refers to a pair of attributes, and the other one is constant.

GEODIST() also accepts an optional 5th argument, allowing you to easily convert between input and output units and select the specific geodistance formula to use. The complete syntax and a few examples are as follows:

GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })

GEODIST(40.7643929, -73.9997683, 40.7642578, -73.9994565, {in=degrees, out=feet})

GEODIST(51.50, -0.12, 29.98, 31.13, {in=deg, out=mi})

The known options and their values are:

The default method is "adaptive". It is a well-optimized implementation that is both more precise and much faster at all times than "haversine".

GEOPOLY2D()

GEOPOLY2D(lat1,lon1,lat2,lon2,lat3,lon3...) creates a polygon to be used with the CONTAINS() function. This function takes into account the Earth's curvature by tessellating the polygon into smaller ones, and should be used for larger areas. For small areas, the POLY2D() function can be used instead. The function expects coordinates to be pairs of latitude/longitude coordinates in degrees; if radians are used, it will give the same result as POLY2D().

POLY2D()

POLY2D(x1,y1,x2,y2,x3,y3...) creates a polygon to be used with the CONTAINS() function. This polygon assumes a flat Earth, so it should not be too large; for large areas, the GEOPOLY2D() function, which takes Earth's curvature into consideration, should be used.

¶ 15.7

String functions

CONCAT()

Concatenates two or more strings into one. Non-string arguments must be explicitly converted to string using the TO_STRING() function.

CONCAT(TO_STRING(float_attr), ',', TO_STRING(int_attr), ',', title)

LEVENSHTEIN()

LEVENSHTEIN ( pattern, source, {normalize=0, length_delta=0}) returns number (Levenshtein distance) of single-character edits (insertions, deletions or substitutions) between pattern and source strings required to make in pattern to make it source.

SELECT LEVENSHTEIN('gily', attr1) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC, dist ASC;
SELECT LEVENSHTEIN('gily', j.name, {length_delta=6}) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC;
SELECT LEVENSHTEIN(title, j.name, {normalize=1}) AS dist, WEIGHT() AS w FROM test WHERE MATCH ('test') ORDER BY w DESC, dist ASC;

REGEX()

The REGEX(attr,expr) function returns 1 if a regular expression matches the attribute's string, and 0 otherwise. It works with both string and JSON attributes.

SELECT REGEX(content, 'box?') FROM test;
SELECT REGEX(j.color, 'red | pink') FROM test;

Expressions should adhere to the RE2 syntax. To perform a case-insensitive search, for instance, you can use:

SELECT REGEX(content, '(?i)box') FROM test;

SNIPPET()

The SNIPPET() function can be used to highlight search results within a given text. The first two arguments are: the text to be highlighted, and a query. Options can be passed to the function as the third, fourth, and so on arguments. SNIPPET() can obtain the text for highlighting directly from the table. In this case, the first argument should be the field name:

SELECT SNIPPET(body, QUERY()) FROM myIndex WHERE MATCH('my.query')

In this example, the QUERY() expression returns the current full-text query. SNIPPET() can also highlight non-indexed text:

mysql  SELECT id, SNIPPET('text to highlight', 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')

Additionally, it can be used to highlight text fetched from other sources using a User-Defined Function (UDF):

SELECT id, SNIPPET(myUdf(id), 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')

In this context, myUdf() is a User-Defined Function (UDF) that retrieves a document by its ID from an external storage source. The SNIPPET() function is considered a "post limit" function, which means that the computation of snippets is delayed until the entire final result set is prepared, and even after the LIMIT clause has been applied. For instance, if a LIMIT 20,10 clause is used, SNIPPET() will be called no more than 10 times.

It is important to note that SNIPPET() does not support field-based limitations. For this functionality, use HIGHLIGHT() instead.

SUBSTRING_INDEX()

SUBSTRING_INDEX(string, delimiter, number) returns a substring of the original string, based on a specified number of delimiter occurrences:

SUBSTRING_INDEX() by default returns a string, but it can also be coerced into other types (such as integer or float) if necessary. Numeric values can be converted using specific functions (such as BIGINT(), DOUBLE(), etc.).

SQL
SELECT SUBSTRING_INDEX('www.w3schools.com', '.', 2) FROM test;
SELECT SUBSTRING_INDEX(j.coord, ' ', 1) FROM test;
SELECT          SUBSTRING_INDEX('1.2 3.4', ' ',  1);  /* '1.2' */
SELECT          SUBSTRING_INDEX('1.2 3.4', ' ', -1);  /* '3.4' */
SELECT sint (   SUBSTRING_INDEX('1.2 3.4', ' ',  1)); /* 1 */
SELECT sint (   SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ',  1)); /* 1.200000 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3.400000 */

UPPER() and LOWER()

UPPER(string) convert argument to upper case, LOWER(string) convert argument to lower case.

Result also can be promoted to numeric, but only if string argument is convertible to a number. Numeric values could be promoted with arbitrary functions (BITINT, DOUBLE, etc.).

SELECT upper('www.w3schools.com', '.', 2); /* WWW.W3SCHOOLS.COM  */
SELECT double (upper ('1.2e3')); /* 1200.000000 */
SELECT integer (lower ('12345')); /* 12345 */
¶ 15.8

Other functions

LAST_INSERT_ID()

Returns the IDs of documents that were inserted or replaced by the last statement in the current session.

The same value can also be obtained via the @@session.last_insert_id variable.

mysql> select @@session.last_insert_id;
+--------------------------+
| @@session.last_insert_id |
+--------------------------+
| 11,32                    |
+--------------------------+
1 rows in set

mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 25,26,29         |
+------------------+
1 rows in set   

CONNECTION_ID()

Returns the current connection ID.

mysql> select CONNECTION_ID();
+-----------------+
| CONNECTION_ID() |
+-----------------+
| 6               |
+-----------------+
1 row in set (0.00 sec)

KNN_DIST()

Returns KNN vector search distance.

mysql> select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926) ) and match('white') and id < 10;
+------+------------+
| id   | knn_dist() |
+------+------------+
|    2 | 0.81527930 |
+------+------------+
1 row in set (0.00 sec)
¶ 16

Securing and compacting a table

¶ 16.1

Backup and Restore

Backing up your tables on a regular basis is essential for recovery in the event of system crashes, hardware failure, or data corruption/loss. It's also highly recommended to make backups before upgrading to a new Manticore Search version or running ALTER TABLE.

Backing up database systems can be done in two unique ways: logical and physical backups. Each of these methods has its pros and cons, which may vary based on the specific database environment and needs. Here, we'll delve into the distinction between these two types of backups.

Logical Backups

Logical backups entail exporting the database schema and data as SQL statements or as data formats specific to the database. This backup form is typically readable by humans and can be employed to restore the database on various systems or database engines.

Pros and cons of logical backups:
- ➕ Portability: Logical backups are generally more portable than physical backups, as they can be used to restore the database on different hardware or operating systems.
- ➕ Flexibility: Logical backups allow you to selectively restore specific tables, indexes, or other database objects.
- ➕ Compatibility: Logical backups can be used to migrate data between different database management systems or versions, provided the target system supports the exported format or SQL statements.
- ➖ Slower Backup and Restore: Logical backups can be slower than physical backups, as they require the database engine to convert the data into SQL statements or another export format.
- ➖ Increased System Load: Creating logical backups can cause higher system load, as the process requires more CPU and memory resources to process and export the data.

Manticore Search supports mysqldump for logical backups.

Physical Backups

Physical backups involve copying the raw data files and system files that comprise the database. This type of backup essentially creates a snapshot of the database's physical state at a given point in time.

Pros and cons of physical backups:
- ➕ Speed: Physical backups are usually faster than logical backups, as they involve copying raw data files directly from disk.
- ➕ Consistency: Physical backups ensure a consistent backup of the entire database, as all related files are copied together.
- ➕ Lower System Load: Creating physical backups generally places less load on the system compared to logical backups, as the process does not involve additional data processing.
- ➖ Portability: Physical backups are typically less portable than logical backups, as they may be dependent on the specific hardware, operating system, or database engine configuration.
- ➖ Flexibility: Physical backups do not allow for the selective restoration of specific database objects, as the backup contains the entire database's raw files.
- ➖ Compatibility: Physical backups cannot be used to migrate data between different database management systems or versions, as the raw data files may not be compatible across different platforms or software.

Manticore Search has manticore-backup command line tool for physical backups.

In summary, logical backups provide more flexibility, portability, and compatibility but can be slower and more resource-intensive, while physical backups are faster, more consistent, and less resource-intensive but may be limited in terms of portability and flexibility. The choice between these two backup methods will depend on your specific database environment, hardware, and requirements.

Using manticore-backup command line tool

The manticore-backup tool, included in the official Manticore Search packages, automates the process of backing up tables for an instance running in RT mode.

Installation

If you followed the official installation instructions, you should already have everything installed and don't need to worry. Otherwise, manticore-backup requires PHP 8.1.10 and specific modules or manticore-executor, which is a part of the manticore-extra package, and you need to ensure that one of these is available.

Note that manticore-backup is not available for Windows yet.

How to use

First, make sure you're running manticore-backup on the same server where the Manticore instance you are about to back up is running.

Second, we recommend running the tool under the root user so the tool can transfer ownership of the files you are backing up. Otherwise, a backup will be also made but with no ownership transfer. In either case, you should make sure that manticore-backup has access to the data dir of the Manticore instance.

The only required argument for manticore-backup is --backup-dir, which specifies the destination for the backup. If you don't provide any additional arguments, manticore-backup will:
- locate a Manticore instance running with the default configuration
- create a subdirectory in the --backup-dir directory with a timestamped name
- backup all tables found in the instance

Example
manticore-backup --config=path/to/manticore.conf --backup-dir=backupdir
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file: /etc/manticoresearch/manticore.conf
Tables to backup: all tables
Target dir: /mnt/backup/

Manticore config
  endpoint =  127.0.0.1:9308

Manticore versions:
  manticore: 5.0.2
  columnar: 1.15.4
  secondary: 1.15.4
2022-10-04 17:18:39 [Info] Starting the backup...
2022-10-04 17:18:39 [Info] Backing up config files...
2022-10-04 17:18:39 [Info]   config files - OK
2022-10-04 17:18:39 [Info] Backing up tables...
2022-10-04 17:18:39 [Info]   pq (percolate) [425B]...
2022-10-04 17:18:39 [Info]    OK
2022-10-04 17:18:39 [Info]   products (rt) [512B]...
2022-10-04 17:18:39 [Info]    OK
2022-10-04 17:18:39 [Info] Running sync
2022-10-04 17:18:42 [Info]  OK
2022-10-04 17:18:42 [Info] You can find backup here: /mnt/backup/backup-20221004171839
2022-10-04 17:18:42 [Info] Elapsed time: 2.76s
2022-10-04 17:18:42 [Info] Done

To back up specific tables only, use the --tables flag followed by a comma-separated list of tables, for example --tables=tbl1,tbl2. This will only backup the specified tables and ignore the rest.

Example
manticore-backup --backup-dir=/mnt/backup/ --tables=products
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file: /etc/manticoresearch/manticore.conf
Tables to backup: products
Target dir: /mnt/backup/

Manticore config
  endpoint =  127.0.0.1:9308

Manticore versions:
  manticore: 5.0.3
  columnar: 1.16.1
  secondary: 0.0.0
2022-10-04 17:25:02 [Info] Starting the backup...
2022-10-04 17:25:02 [Info] Backing up config files...
2022-10-04 17:25:02 [Info]   config files - OK
2022-10-04 17:25:02 [Info] Backing up tables...
2022-10-04 17:25:02 [Info]   products (rt) [512B]...
2022-10-04 17:25:02 [Info]    OK
2022-10-04 17:25:02 [Info] Running sync
2022-10-04 17:25:06 [Info]  OK
2022-10-04 17:25:06 [Info] You can find backup here: /mnt/backup/backup-20221004172502
2022-10-04 17:25:06 [Info] Elapsed time: 4.82s
2022-10-04 17:25:06 [Info] Done

Arguments

Argument Description
--backup-dir=path This is the path to the backup directory where the backup will be stored. The directory must already exist. This argument is required and has no default value. On each backup run, manticore-backup will create a subdirectory in the provided directory with a timestamp in the name (backup-[datetime]), and will copy all required tables to it. So the --backup-dir is a container for all your backups, and it's safe to run the script multiple times.
--restore[=backup] Restore from --backup-dir. Just --restore lists available backups. --restore=backup will restore from <--backup-dir>/backup.
--force Skip versions check on restore and gracefully restore the backup.
--disable-telemetry Pass this flag in case you want to disable sending anonymized metrics to Manticore. You can also use environment variable TELEMETRY=0
--config=/path/to/manticore.conf Path to the Manticore configuration. Optional. If not provided, a default configuration for your operating system will be used. Used to determine the host and port for communication with the Manticore daemon. The manticore-backup tool supports dynamic configuration files. You can specify the --config option multiple times if your configuration is spread across multiple files.
--tables=tbl1,tbl2, ... Semicolon-separated list of tables that you want to back up. To back up all tables, omit this argument. All the provided tables must exist in the Manticore instance you are backing up from, or the backup will fail.
--compress Whether the backed up files should be compressed. Not enabled by default.
--unlock In rare cases when something goes wrong, tables can be left in a locked state. Use this argument to unlock them.
--version Show the current version.
--help Show this help.

BACKUP SQL command reference

You can also back up your data through SQL by running the simple command BACKUP TO /path/to/backup.

Note, this command is not supported in Windows yet.

General syntax of BACKUP

BACKUP
  [{TABLE | TABLES} a[, b]]
  [{OPTION | OPTIONS}
    async = {on | off | 1 | 0 | true | false | yes | no}
    [, compress = {on | off | 1 | 0 | true | false | yes | no}]
  ]
  TO path_to_backup

For instance, to back up tables a and b to the /backup directory, run the following command:

BACKUP TABLES a, b TO /backup

There are options available to control and adjust the backup process, such as:

BACKUP OPTION async = yes, compress = yes TO /tmp

Important considerations

  1. The path should not contain special symbols or spaces, as they are not supported.
  2. Ensure that Manticore Buddy is launched (it is by default).

How backup maintains consistency of tables

To ensure consistency of tables during backup, Manticore Search's backup tools use the innovative FREEZE and UNFREEZE commands. Unlike the traditional lock and unlock tables feature of e.g. MySQL, FREEZE stops flushing data to disk while still permitting writing (to some extent) and selecting updated data from the table.

However, if your RAM chunk size grows beyond the rt_mem_limit threshold during lengthy backup operations involving many inserts, data may be flushed to disk, and write operations will be blocked until flushing is complete. Despite this, the tool maintains a balance between table locking, data consistency, and database write availability while the table is frozen.

When you use manticore-backup or the SQL BACKUP command, the FREEZE command is executed once and freezes all tables you are backing up simultaneously. The backup process subsequently backs up each table one by one, releasing the freeze after successfully backing up each table.

If backup fails or gets interrupted, the tool tries to unfreeze all the tables.

Restore by using manticore-backup tool

To restore a Manticore instance from a backup, use the manticore-backup command with the --backup-dir and --restore arguments. For example: manticore-backup --backup-dir=/path/to/backups --restore. If you don't provide any argument for --restore, it will simply list all the backups in the --backup-dir.

Example
manticore-backup --backup-dir=/mnt/backup/ --restore
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file:
Backup dir: /tmp/

Available backups: 3
  backup-20221006144635 (Oct 06 2022 14:46:35)
  backup-20221006145233 (Oct 06 2022 14:52:33)
  backup-20221007104044 (Oct 07 2022 10:40:44)

To start a restore job, run manticore-backup with the flag --restore=backup name, where backup name is the name of the backup directory within the --backup-dir. Note that:
1. There can't be any Manticore instance running on the same host and port as the one being restored.
2. The old manticore.json file must not exist.
3. The old configuration file must not exist.
4. The old data directory must exist and be empty.

If all conditions are met, the restore will proceed. The tool will provide hints, so you don't have to memorize them. It's crucial to avoid overwriting existing files, so make sure to remove them prior to the restore if they still exist. Hence all the conditions.

Example
manticore-backup --backup-dir=/mnt/backup/ --restore=backup-20221007104044
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file:
Backup dir: /tmp/
2022-10-07 11:17:25 [Info] Starting to restore...

Manticore config
  endpoint =  127.0.0.1:9308
2022-10-07 11:17:25 [Info] Restoring config files...
2022-10-07 11:17:25 [Info]   config files - OK
2022-10-07 11:17:25 [Info] Restoring state files...
2022-10-07 11:17:25 [Info]   config files - OK
2022-10-07 11:17:25 [Info] Restoring data files...
2022-10-07 11:17:25 [Info]   config files - OK
2022-10-07 11:17:25 [Info] The backup '/tmp/backup-20221007104044' was successfully restored.
2022-10-07 11:17:25 [Info] Elapsed time: 0.02s
2022-10-07 11:17:25 [Info] Done

Backup and restore with mysqldump

To create a backup of your Manticore Search database, you can use the mysqldump command. We will use the default port and host in the examples.

Note, mysqldump is supported only for real-time tables.

SQL
mysqldump -h0 -P9306 manticore > manticore_backup.sql
mariadb-dump -h0 -P9306 manticore > manticore_backup.sql
Executing this command will produce a backup file named `manticore_backup.sql`. This file will hold all data and table schemas.

Restore

If you're looking to restore a Manticore Search database from a backup file, the mysql client is your tool of choice.

Note, if you are restoring in Plain mode, you cannot drop and recreate tables directly. Therefore, you should:
- Use mysqldump with the -t option to exclude CREATE TABLE statements from your backup.
- Manually TRUNCATE the tables before proceeding with the restoration.

SQL
mysql -h0 -P9306 < manticore_backup.sql
mariadb -h0 -P9306 < manticore_backup.sql
This command enables you to restore everything from the `manticore_backup.sql` file.

Additional options

Here are some more settings that can be used with mysqldump to tailor your backup:

For a comprehensive list of settings and their thorough descriptions, kindly refer to the official MySQL documentation.

Notes

We recommend specifying the manticore database explicitly when you plan to back up all databases, rather than using the --all-databases option.

Keep in mind that mysqldump does not support backing up distributed tables. Additionally, it cannot back up tables that contain non-stored fields (consider using manticore-backup or the BACKUP SQL command).

¶ 16.2

Real-time table structure

A plain table can be created from an external source using a special tool called indexer, which reads a "recipe" from the configuration, connects to the data sources, pulls documents, and builds table files. This is a lengthy process. If your data changes, the table becomes outdated, and you need to rebuild it from the refreshed sources. If your data changes incrementally, such as a blog or newsfeed where old documents never change and only new ones are added, the rebuild will take more and more time, as you will need to process the archive sources again and again with each pass.

One way to deal with this problem is by using several tables instead of one solid table. For example, you can process sources produced in previous years and save the table. Then, take only sources from the current year and put them into a separate table, rebuilding it as often as necessary. You can then place both tables as parts of a distributed table and use it for querying. The point here is that each time you rebuild, you only process data from the last 12 months at most, and the table with older data remains untouched without needing to be rebuilt. You can go further and divide the last 12 months table into monthly, weekly, or daily tables, and so on.

This approach works, but you need to maintain your distributed table manually. That is, add new chunks, delete old ones, and keep the overall number of partial tables not too large (with too many tables, searching can become slower, and the OS usually limits the number of simultaneously opened files). To deal with this, you can manually merge several tables together by running indexer --merge. However, that only solves the problem of having many tables, making maintenance more challenging. And even with 'per-hour' reindexing, you will most likely have a noticeable time gap between new data arriving in sources and rebuilding the table, which populates this data for searching.

A real-time table is designed to solve this problem. It consists of two parts:

  1. A special RAM-based table (called RAM chunk) that contains portions of data arriving right now.
  2. A collection of plain tables called disk chunks that were built in the past.

This is very similar to a standard distributed table, made from several local tables.

You don't need to build such a table by running indexer, which reads a "recipe" from the config and tables data sources. Instead, the real-time table provides the ability to 'insert' new documents and 'replace' existing ones. When executing the 'insert' command, you push new documents to the server. It then builds a small table from the added documents and immediately brings it online. So, right after the 'insert' command completes, you can perform searches in all table parts, including the just-added documents.

The search server automatically maintains the table, so you don't have to worry about it. However, you might be interested in learning a few details about 'how it is maintained'.

First, since indexed data is stored in RAM - what about emergency power-off? Will I lose my table then? Well, before completion, the server saves new data into a special 'binlog'. This consists of one or several files, living on your persistent storage, which incrementally grows as you add more and more changes. You can adjust the behavior regarding how often new queries (or transactions) are stored in the binlog, and how often the 'sync' command is executed over the binlog file to force the OS to actually save the data on a safe storage. The most paranoid approach is to flush and sync after every transaction. This is the slowest but also the safest method. The least expensive way is to switch off the binlog entirely. This is the fastest method, but you risk losing your indexed data. Intermediate variants, like flush/sync every second, are also provided.

The binlog is designed specifically for sequential saving of newly arriving transactions; it is not a table and cannot be searched over. It is merely an insurance policy to ensure that the server will not lose your data. If a sudden disruption occurs and everything crashes due to a software or hardware problem, the server will load the freshest available dump of the RAM chunk and then replay the binlog, repeating stored transactions. Ultimately, it will achieve the same state as it was in at the moment of the last change.

Second, what about limits? What if I want to process, say, 10TB of data, but it just doesn't fit into RAM! RAM for a real-time table is limited and can be configured. When a certain amount of data is indexed, the server manages the RAM part of the table by merging together small transactions, keeping their number and overall size small. This process can sometimes cause delays during insertion, however. When merging no longer helps, and new insertions hit the RAM limit, the server converts the RAM-based table into a plain table stored on disk (called a disk chunk). This table is added to the collection of tables in the second part of the RT table and becomes accessible online. The RAM is then flushed, and the space is deallocated.

When the data from RAM is securely saved to disk, which occurs:

the binlog for that table is no longer necessary. So, it gets discarded. If all the tables are saved, the binlog will be deleted.

Third, what about disk collection? If having many disk parts makes searching slower, what's the difference if I make them manually in the distributed table manner, or they're produced as disk parts (or, 'chunks') by an RT table? Well, in both cases, you can merge several tables into one. For example, you can merge hourly tables from yesterday and keep one 'daily' table for yesterday instead. With manual maintenance, you have to think about the schema and commands yourself. With an RT table, the server provides the OPTIMIZE command, which does the same, but keeps you away from unnecessary internal details.

Fourth, if my "document" constitutes a 'mini-table' and I don't need it anymore, I can just throw it away. But if it is 'optimized', i.e. mixed together with tons of other documents, how can I undo or delete it? Yes, indexed documents are 'mixed' together, and there is no easy way to delete one without rebuilding the whole table. And if for plain tables rebuilding or merging is just a normal way of maintenance, for a real-time table it keeps only the simplicity of manipulation, but not 'real-timeness'. To address the problem, Manticore uses a trick: when you delete a document, identified by document ID, the server just tracks the number. Together with other deleted documents, their IDs are saved in a so-called kill-list. When you search over the table, the server first retrieves all matching documents, and then throws out the documents that are found in the kill-list (that is the most basic description; in fact, internally it's more complex). The point is - for the sake of 'immediate' deletion, documents are not actually deleted, but are just marked as 'deleted'. They still occupy space in different table structures, being essentially garbage. Word statistics, which affect ranking, also aren't affected, meaning it works exactly as it is declared: we search among all documents, and then just hide ones marked as deleted from the final result. When a document is replaced, it means that it is killed in the old parts of the table and is inserted again in the freshest part. All consequences of 'hiding by killlist' are also in play in this case.

When a rebuild of some part of a table happens, e.g., when some transactions (segments) of a RAM chunk are merged, or when a RAM chunk is converted into a disk chunk, or when two disk chunks are merged together, the server performs a comprehensive iteration over the affected parts and physically excludes deleted documents from all of them. That is, if they were in document lists of some words - they are wiped away. If it was a unique word - it gets removed completely.

As a summary: the deletion works in two phases:
1. First, we mark documents as 'deleted' in real-time and suppress them in search results.
2. During some operation with an RT table chunk, we finally physically wipe the deleted documents for good.

Fifth, if an RT table contains plain disk tables in its collection, can I just add my ready old disk table to it? No. It's not possible to avoid unneeded complexity and prevent accidental corruption. However, if your RT table has just been created and contains no data, then you can ATTACH TABLE your disk table to it. Your old table will be moved inside the RT table and will become its part.

As a summary about the RT table structure: it is a cleverly organized collection of plain disk tables with a fast in-memory table, intended for real-time insertions and semi-real-time deletions of documents. The RT table has a common schema, common settings, and can be easily maintained without deep digging into details.

¶ 16.3

Flushing RAM chunk to a new disk chunk

FLUSH RAMCHUNK

FLUSH RAMCHUNK rt_table

The FLUSH RAMCHUNK command creates a new disk chunk in an RT table.

Normally, an RT table would automatically flush and convert the contents of the RAM chunk into a new disk chunk once the RAM chunk reaches the maximum allowed rt_mem_limit size. However, for debugging and testing purposes, it might be useful to forcibly create a new disk chunk, and the FLUSH RAMCHUNK statement does exactly that.

SQL
FLUSH RAMCHUNK rt;
Query OK, 0 rows affected (0.05 sec)
¶ 16.4

Flushing RAM chunk to disk

FLUSH TABLE

FLUSH TABLE rt_table

FLUSH TABLE forcefully flushes RT table RAM chunk contents to disk.

The real-time table RAM chunk is automatically flushed to disk during a clean shutdown, or periodically every rt_flush_period seconds.

Issuing a FLUSH TABLE command not only forces the RAM chunk contents to be written to disk but also triggers the cleanup of binary log files.

SQL
FLUSH TABLE rt;
Query OK, 0 rows affected (0.05 sec)
¶ 16.5

Compacting a Table

Over time, RT tables may become fragmented into numerous disk chunks and/or contaminated with deleted, yet unpurged data, affecting search performance. In these cases, optimization is necessary. Essentially, the optimization process combines pairs of disk chunks, removing documents that were previously deleted using DELETE statements.

Beginning with Manticore 4, this process occurs automatically by default. However, you can also use the following commands to manually initiate table compaction.

OPTIMIZE TABLE

OPTIMIZE TABLE index_name [OPTION opt_name = opt_value [,...]]

OPTIMIZE statement adds an RT table to the optimization queue, which will be processed in a background thread.

SQL
OPTIMIZE TABLE rt;

Number of optimized disk chunks

By default, OPTIMIZE merges the RT table's disk chunks down to a number equal to # of CPU cores * 2. You can control the number of optimized disk chunks using the cutoff option.

Additional options include:

SQL
OPTIMIZE TABLE rt OPTION cutoff=4;

Running in foreground

When using OPTION sync=1 (0 by default), the command will wait for the optimization process to complete before returning. If the connection is interrupted, the optimization will continue running on the server.

SQL
OPTIMIZE TABLE rt OPTION sync=1;

Throttling the IO impact

Optimization can be a lengthy and I/O-intensive process. To minimize the impact, all actual merge work is executed serially in a special background thread, and the OPTIMIZE statement simply adds a job to its queue. The optimization thread can be I/O-throttled, and you can control the maximum number of I/Os per second and the maximum I/O size with the rt_merge_iops and rt_merge_maxiosize directives, respectively.

During optimization, the RT table being optimized remains online and available for both searching and updates nearly all the time. It is locked for a very brief period when a pair of disk chunks is successfully merged, allowing for the renaming of old and new files and updating the table header.

Optimizing clustered tables

As long as auto_optimize is not disabled, tables are optimized automatically.

If you are experiencing unexpected SSTs or want tables across all nodes of the cluster to be binary identical, you need to:
1. Disable auto_optimize.
2. Manually optimize tables:

On one of the nodes, drop the table from the cluster:

SQL
ALTER CLUSTER mycluster DROP myindex;

Optimize the table:

SQL
OPTIMIZE TABLE myindex;

Add back the table to the cluster:

SQL
ALTER CLUSTER mycluster ADD myindex;

When the table is added back, the new files created by the optimization process will be replicated to the other nodes in the cluster.
Any local changes made to the table on other nodes will be lost.

Table data modifications (inserts, replaces, deletes, updates) should either:

  1. Be postponed, or
  2. Be directed to the node where the optimization process is running.

Note that while the table is out of the cluster, insert/replace/delete/update commands should refer to it without the cluster name prefix (for SQL statements or the cluster property in case of an HTTP JSON request), otherwise they will fail.
Once the table is added back to the cluster, you must resume write operations on the table and include the cluster name prefix again, or they will fail.

Search operations are available as usual during the process on any of the nodes.

¶ 16.6

Isolation during flushing and merging

Manticore provides isolation during the flushing and merging process of a real-time table to prevent any changes from affecting running queries.

For example, during table compaction, a pair of disk chunks are merged and a new chunk is produced. At one point, a new version of the table is created with the new chunk replacing the original pair. This is done seamlessly so that a long-running query using the original chunks will continue to see the old version of the table, while a new query will see the new version with the resulting merged chunk.

The same applies to flushing a RAM chunk, where suitable RAM segments are merged into a new disk chunk and the participated RAM chunk segments are abandoned. During this operation, Manticore provides isolation for queries that started before the operation began.

Furthermore, these operations are transparent for replaces and updates. If you update an attribute in a document that belongs to a disk chunk being merged with another one, the update will be applied to both that chunk and the resulting merged chunk. If you delete a document during a merge, it will be deleted in the original chunk and also in the resulting merged chunk, which will either have the document marked as deleted or have no such document at all if the deletion happened early in the merging process.

¶ 16.7

Freezing a table

FREEZE tbl1[, tbl2, ...]

FREEZE readies a real-time/plain table for a secure backup. Specifically, it:
1. Deactivates table compaction. If the table is currently being compacted, FREEZE will interrupt it.
2. Transfers the current RAM chunk to a disk chunk.
3. Flushes attributes.
4. Disables implicit operations that could modify the disk files.
5. Shows the actual file list associated with the table.

The built-in tool manticore-backup uses FREEZE to ensure data consistency. You can do the same if you want to create your own backup solution or need to freeze tables for other reasons. Just follow these steps:
1. FREEZE a table (or a few).
2. Capture the output of the FREEZE command and back up the specified files.
3. UNFREEZE the table(s) once finished.

Example
FREEZE t;
+-------------------+---------------------------------+
| file              | normalized                      |
+-------------------+---------------------------------+
| data/t/t.0.spa    | /work/anytest/data/t/t.0.spa    |
| data/t/t.0.spd    | /work/anytest/data/t/t.0.spd    |
| data/t/t.0.spds   | /work/anytest/data/t/t.0.spds   |
| data/t/t.0.spe    | /work/anytest/data/t/t.0.spe    |
| data/t/t.0.sph    | /work/anytest/data/t/t.0.sph    |
| data/t/t.0.sphi   | /work/anytest/data/t/t.0.sphi   |
| data/t/t.0.spi    | /work/anytest/data/t/t.0.spi    |
| data/t/t.0.spm    | /work/anytest/data/t/t.0.spm    |
| data/t/t.0.spp    | /work/anytest/data/t/t.0.spp    |
| data/t/t.0.spt    | /work/anytest/data/t/t.0.spt    |
| data/t/t.meta     | /work/anytest/data/t/t.meta     |
| data/t/t.ram      | /work/anytest/data/t/t.ram      |
| data/t/t.settings | /work/anytest/data/t/t.settings |
+-------------------+---------------------------------+
13 rows in set (0.01 sec)

The file column indicates the table's file paths within the data_dir of the running instance. The normalized column displays the absolute paths for the same files. To back up a table, simply copy the provided files without additional preparation.

When a table is frozen, you cannot execute UPDATE queries; they will fail with the error message "index is locked now, try again later."

Also, DELETE and REPLACE queries have some restrictions while the table is frozen:

Manually FLUSHing a RAM chunk of a frozen table will report 'success', but no real saving will occur.

DROP/TRUNCATE of a frozen table is allowed since these operations are not implicit. We assume that if you truncate or drop a table, you don't need it backed up; therefore, it should not have been frozen initially.

INSERTing into a frozen table is supported but limited: new data will be stored in RAM (as usual) until rt_mem_limit is reached; then, new insertions will wait until the table is unfrozen.

If you shut down the daemon with a frozen table, it will act as if it experienced a dirty shutdown (e.g., kill -9): newly inserted data will not be saved in the RAM-chunk on disk, and upon restart, it will be restored from a binary log (if any) or lost (if binary logging is disabled).

Unfreezing a table

UNFREEZE tbl1[, tbl2, ...]

UNFREEZE reactivates previously blocked operations and resumes the internal compaction service. All operations waiting for a table to unfreeze will also be unfrozen and complete normally.

Example
UNFREEZE tbl;
¶ 16.8

FLUSH ATTRIBUTES

FLUSH ATTRIBUTES

The FLUSH ATTRIBUTES command flushes all in-memory attribute updates in all the active disk tables to disk. It returns a tag that identifies the result on-disk state (which is basically a number of actual disk attribute saves performed since the server startup).

mysql> UPDATE testindex SET channel_id=1107025 WHERE id=1;
Query OK, 1 row affected (0.04 sec)

mysql> FLUSH ATTRIBUTES;
+------+
| tag  |
+------+
|    1 |
+------+
1 row in set (0.19 sec)
¶ 16.9

FLUSH HOSTNAMES

FLUSH HOSTNAMES

The FLUSH HOSTNAMES command is used to renew IP addresses associated with agent host names. If you want to always query the DNS for getting the host name IP, you can use the hostname_lookup directive.

mysql> FLUSH HOSTNAMES;
Query OK, 5 rows affected (0.01 sec)
¶ 17

Security

¶ 17.1

SSL

In many cases, you might want to encrypt traffic between your client and the server. To do that, you can specify that the server should use the HTTPS protocol rather than HTTP.

To enable HTTPS, at least the following two directives should be set in the searchd section of the config, and there should be at least one listener set to https

In addition to that, you can specify the certificate authority's certificate (aka root certificate) in:

with CA
Example with CA:
ssl_ca = ca-cert.pem
ssl_cert = server-cert.pem
ssl_key = server-key.pem
without CA
Example without CA:
ssl_cert = server-cert.pem
ssl_key = server-key.pem

Generating SSL files

These steps will help you generate the SSL certificates using the 'openssl' tool.

The server can use a Certificate Authority to verify the signature of certificates, but it can also work with just a private key and certificate (without the CA certificate).

Generate the CA key

openssl genrsa 2048 > ca-key.pem

Generate the CA certificate from the CA key

To generate a self-signed CA (root) certificate from the private key (make sure to fill in at least the "Common Name"), use the following command:

openssl req -new -x509 -nodes -days 365 -key ca-key.pem -out ca-cert.pem

Server Certificate

The server uses the server certificate to secure communication with the client. To generate the certificate request and server private key (ensure that you fill in at least the "Common Name" and that it is different from the root certificate's common name), execute the following commands:

openssl req -newkey rsa:2048 -days 365 -nodes -keyout server-key.pem -out server-req.pem
openssl rsa -in server-key.pem -out server-key.pem
openssl x509 -req -in server-req.pem -days 365 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem

Once completed, you can verify that the key and certificate files were generated correctly by running:

openssl verify -CAfile ca-cert.pem server-cert.pem

Secured connection behaviour

When your SSL configuration is valid, the following features are available:

If your SSL configuration is not valid for any reason (which the daemon detects by the fact that a secured connection cannot be established), apart from an invalid configuration there may be other reasons, such as the inability to load the appropriate SSL library at all. In this case, the following things will not work or will work in a non-secured manner:

Caution:

¶ 17.2

Read-only mode

Read-only mode for a connection disables any table or global modifications. Therefore, queries like create, drop, various types of alter, attach, optimize, and data modification queries such as insert, replace, delete, update, and others will all be rejected. Changing daemon-wide settings using SET GLOBAL is also not possible in this mode.

However, you can still perform all search operations, generate snippets, and run CALL PQ queries. Additionally, you can modify local (connection-wide) settings.

To check if your current connection is read-only or not, execute the show variables like 'session_read_only' statement. A value of 1 indicates read-only, while 0 means not read-only (usual).

Activation

Typically, you define a separate listen directive in read-only mode by adding the suffix _readonly to it. However, you can also do this interactively for the current connection by executing the SET ro=1 statement via SQL.

Deactivation

If you're connected to a VIP socket, you can execute SET ro=0 (even if the socket you are connected to was defined as read-only in the config and not interactively). This will switch the connection to the usual (not read-only) mode with all modifications allowed.

For standard (non-VIP) connections, escaping read-only mode is only possible by reconnecting if it was set interactively, or by updating the configuration file and restarting the daemon.

¶ 18

Logging

¶ 18.1

Query logging

Query logging can be enabled by setting the query_log directive in the searchd section of the configuration file.

Queries can also be sent to syslog by setting syslog instead of a file path. In this case, all search queries will be sent to the syslog daemon with LOG_INFO priority, prefixed with [query] instead of a timestamp. Only the plain log format is supported for syslog.

query_log example:

Config
searchd {
...
    query_log = /var/log/query.log
    query_log_format = sphinxql # default
...
}

Logging format

Two query log formats are supported:

To switch between the formats, you can use the searchd setting query_log_format.

SQL log format

The SQL log format is the default setting. In this mode, Manticore logs all successful and unsuccessful select queries. Requests sent as SQL or via the binary API are logged in the SQL format, but JSON queries are logged as is. This type of logging only works with plain log files and does not support the 'syslog' service for logging.

query_log_format example:

Config
query_log_format = sphinxql # default

The features of the Manticore SQL log format compared to the plain format include:

sphinxql log entries example:

Example
/* Sun Apr 28 12:38:02.808 2024 conn 2 (127.0.0.1:53228) real 0.000 wall 0.000 found 0 */ SELECT * FROM test WHERE MATCH('test') OPTION ranker=proximity;
/* Sun Apr 28 12:38:05.585 2024 conn 2 (127.0.0.1:53228) real 0.001 wall 0.001 found 0 */ SELECT * FROM test WHERE MATCH('test') GROUP BY channel_id OPTION ranker=proximity;
/* Sun Apr 28 12:40:57.366 2024 conn 4 (127.0.0.1:53256) real 0.000 wall 0.000 found 0 */  /*{
    "index" : "test",
    "query":
    {
        "match":
        {
            "*" : "test"
        }
    },
    "_source": ["f"],
    "limit": 30
} */

Plain log format

With the plain log format, Manticore logs all successfully executed search queries in a simple text format. Non-full-text parts of queries are not logged. JSON queries are logged as flattened to a single line.

query_log_format example:

Config
query_log_format = plain

The log format is as follows:

[query-date] real-time wall-time [match-mode/filters-count/sort-mode total-matches (offset,limit) @groupby-attr] [table-name] {perf-stats} query

where:

Note: the SPH* modes are specific to the sphinx legacy interface. SQL and JSON interfaces will log, in most cases, ext2 as match-mode and ext and rel as sort-mode.

Query log example:

Example
[Fri Jun 29 21:17:58 2021] 0.004 sec [all/0/rel 35254 (0,20)] [lj] [ios=6 kb=111.1 ms=0.5] test
[Fri Jun 29 21:17:58 2021] 0.004 sec [all/0/rel 35254 (0,20)] [lj] [ios=6 kb=111.1 ms=0.5 cpums=0.3] test
[Sun Apr 28 15:09:38.712 2024] 0.000 sec 0.000 sec [ext2/0/ext 0 (0,20)] [test] test
[Sun Apr 28 15:09:44.974 2024] 0.000 sec 0.000 sec [ext2/0/ext 0 (0,20) @channel_id] [test] test
[Sun Apr 28 15:24:32.975 2024] 0.000 sec 0.000 sec [ext2/0/ext 0 (0,30)] [test] {     "index" : "test",     "query":     {         "match":         {             "*" : "test"         }     },     "_source": ["f"],     "limit": 30 }

Logging only slow queries

By default, all queries are logged. If you want to log only queries with execution times exceeding a specified limit, the query_log_min_msec directive can be used.

The expected unit of measurement is milliseconds, but time suffix expressions can also be used.

query_log_min_msec example:

Config
searchd {
...
    query_log = /var/log/query.log
    query_log_min_msec  = 1000
    # query_log_min_msec  = 1s
...
}

Log file permission mode

By default, the searchd and query log files are created with permission 600, so only the user under which Manticore is running and root can read the log files. The query_log_mode option allows setting a different permission. This can be helpful for allowing other users to read the log files (for example, monitoring solutions running on non-root users).

query_log_mode example:

Config
searchd {
...
    query_log = /var/log/query.log
    query_log_mode = 666
...
}
¶ 18.2

Server logging

By default, Manticore search daemon logs all runtime events in a searchd.log file in the directory where searchd was started. In Linux by default, you can find the log at /var/log/manticore/searchd.log.

The log file path/name can be overridden by setting log in the searchd section of the configuration file.

searchd {
...
    log = /custom/path/to/searchd.log
...
}
¶ 18.3

Binary logging

Binary logging serves as a recovery mechanism for Real-Time table data, as well as attribute updates for plain tables that would otherwise only be stored in RAM until a flush occurs. When binary logs are enabled, searchd records each transaction to the binlog file and utilizes it for recovery following an unclean shutdown. During a clean shutdown, RAM chunks are saved to disk, and all binlog files are subsequently unlinked.

Disabling binary logging

By default, binary logging is enabled. On Linux systems, the default location for binlog.* files is /var/lib/manticore/data/.
In RT mode, binary logs are stored in the data_dir folder, unless specified otherwise.

To disable binary logging, set binlog_path to empty:

searchd {
...
    binlog_path = # disable logging
...

Disabling binary logging can lead to better performance for Real-Time tables, but it also puts their data at risk.

You can use the following directive to set a custom path:

searchd {
...
    binlog_path = /var/data
...

Operations

When logging is enabled, each transaction committed to an RT table is written to a log file. After an unclean shutdown, logs are automatically replayed upon startup, recovering any logged changes.

Log size

During normal operation, a new binlog file is opened whenever the binlog_max_log_size limit is reached. Older, closed binlog files are retained until all transactions stored in them (from all tables) are flushed as a disk chunk. Setting the limit to 0 essentially prevents the binlog from being unlinked while searchd is running; however, it will still be unlinked upon a clean shutdown. By default, there is no limit to the log file size.

binlog_max_log_size = 16M

Binary flushing strategies

There are 3 different binlog flushing strategies, controlled by the binlog_flush directive:

The default mode is to flush every transaction and sync every second (mode 2).

searchd {
...
    binlog_flush = 1 # ultimate safety, low speed
...
}

Recovery

During recovery after an unclean shutdown, binlogs are replayed, and all logged transactions since the last good on-disk state are restored. Transactions are checksummed, so in case of binlog file corruption, garbage data will not be replayed; such a broken transaction will be detected and will stop the replay.

Flushing RT RAM chunks

Intensive updates to a small RT table that fully fits into a RAM chunk can result in an ever-growing binlog that can never be unlinked until a clean shutdown. Binlogs essentially serve as append-only deltas against the last known good saved state on disk, and they cannot be unlinked unless the RAM chunk is saved. An ever-growing binlog is not ideal for disk usage and crash recovery time. To address this issue, you can configure searchd to perform periodic RAM chunk flushes using the rt_flush_period directive. With periodic flushes enabled, searchd will maintain a separate thread that checks whether RT table RAM chunks need to be written back to disk. Once this occurs, the respective binlogs can be (and are) safely unlinked.

searchd {
...
    rt_flush_period = 3600 # 1 hour
...
}

The default RT flush period is set to 10 hours.

It's important to note that rt_flush_period only controls the frequency at which checks occur. There are no guarantees that a specific RAM chunk will be saved. For example, it doesn't make sense to regularly re-save a large RAM chunk that only receives a few rows worth of updates. Manticore automatically determines whether to perform the flush using a few heuristics.

¶ 18.4

Docker logging

When you use the official Manticore docker image, the server log is sent to /dev/stdout which can be viewed from host with:

docker logs manticore

The query log can be diverted to the Docker log by passing the variable QUERY_LOG_TO_STDOUT=true.

The log folder is the same as in the case of the Linux package, set to /var/log/manticore. If desired, it can be mounted to a local path to view or process the logs.

¶ 18.5

Rotating query and server logs

Manticore Search accepts the USR1 signal for reopening server and query log files.

The official DEB and RPM packages install a Logrotate configuration file for all files in the default log folder.

A simple logrotate configuration for log files looks like:

/var/log/manticore/*.log {
       weekly
       rotate 10
       copytruncate
       delaycompress
       compress
       notifempty
       missingok
}

FLUSH LOGS

mysql> FLUSH LOGS;
Query OK, 0 rows affected (0.01 sec)

Additionally, the FLUSH LOGS SQL command is available, which works the same way as the USR1 system signal. It initiates the reopening of searchd log and query log files, allowing you to implement log file rotation. The command is non-blocking (i.e., it returns immediately).

¶ 19

Node info and management

¶ 19.1

Node status

STATUS

The easiest way to view high-level information about your Manticore node is by running status in the MySQL client. It will display information about various aspects, such as:

SQL
mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.30, for Linux (x86_64) using  EditLine wrapper

Connection id:      378
Current database:   Manticore
Current user:       Usual
SSL:            Not in use
Current pager:      stdout
Using outfile:      ''
Using delimiter:    ;
Server version:     3.4.3 a48c61d6@200702 coroutines git branch coroutines_work_junk...origin/coroutines_work_junk
Protocol version:   10
Connection:     0 via TCP/IP
Server characterset:
Db     characterset:
Client characterset:    utf8
Conn.  characterset:    utf8
TCP port:       8306
Uptime:         23 hours 6 sec

Threads: 12  Queue: 3  Clients: 1  Vip clients: 0  Tasks: 5  Queries: 318967  Wall: 7h  CPU: 0us
Queue/Th: 0.2  Tasks/Th: 0.4
--------------

SHOW STATUS

SHOW STATUS [ LIKE pattern ]

SHOW STATUS is an SQL statement that presents various helpful performance counters. IO and CPU counters will only be available if searchd was started with the --iostats and --cpustats switches, respectively (or if they were enabled via SET GLOBAL iostats/cpustats=1).

SQL
SHOW STATUS;
+-----------------------+---------------------------+
| Counter               | Value                     |
+-----------------------+---------------------------+
| uptime                | 1385                      |
| connections           | 11                        |
| maxed_out             | 0                         |
| version               | 3.4.3 ab7cbe5d@200511 dev |
| mysql_version         | 3.4.3 ab7cbe5d@200511 dev |
| command_search        | 2                         |
| command_excerpt       | 0                         |
| command_update        | 0                         |
| command_delete        | 0                         |
| command_keywords      | 0                         |
| command_persist       | 0                         |
| command_status        | 1                         |
| command_flushattrs    | 0                         |
| command_set           | 1                         |
| command_insert        | 0                         |
| command_replace       | 0                         |
| command_commit        | 0                         |
| command_suggest       | 0                         |
| command_json          | 0                         |
| command_callpq        | 0                         |
| agent_connect         | 0                         |
| agent_retry           | 0                         |
| queries               | 12                        |
| dist_queries          | 0                         |
| workers_total         | 30                        |
| workers_active        | 1                         |
| workers_clients       | 0                         |
| workers_clients_vip   | 1                         |
| work_queue_length     | 1                         |
| query_wall            | 10.805                    |
| query_cpu             | OFF                       |
| dist_wall             | 0.000                     |
| dist_local            | 0.000                     |
| dist_wait             | 0.000                     |
| query_reads           | OFF                       |
| query_readkb          | OFF                       |
| query_readtime        | OFF                       |
| avg_query_wall        | 0.900                     |
| avg_query_cpu         | OFF                       |
| avg_dist_wall         | 0.000                     |
| avg_dist_local        | 0.000                     |
| avg_dist_wait         | 0.000                     |
| avg_query_reads       | OFF                       |
| avg_query_readkb      | OFF                       |
| avg_query_readtime    | OFF                       |
| qcache_max_bytes      | 0                         |
| qcache_thresh_msec    | 3000                      |
| qcache_ttl_sec        | 60                        |
| qcache_cached_queries | 0                         |
| qcache_used_bytes     | 0                         |
| qcache_hits           | 0                         |
+-----------------------+---------------------------+
49 rows in set (0.00 sec)

An optional LIKE clause is supported, allowing you to select only the variables that match a specific pattern. The pattern syntax follows standard SQL wildcards, where % represents any number of any characters, and _ represents a single character.

SQL
SHOW STATUS LIKE 'qcache%';
+-----------------------+-------+
| Counter               | Value |
+-----------------------+-------+
| qcache_max_bytes      | 0     |
| qcache_thresh_msec    | 3000  |
| qcache_ttl_sec        | 60    |
| qcache_cached_queries | 0     |
| qcache_used_bytes     | 0     |
| qcache_hits           | 0     |
+-----------------------+-------+
6 rows in set (0.00 sec)

SHOW SETTINGS

SHOW SETTINGS is an SQL statement that displays the current settings from your configuration file. The setting names are represented in the following format: 'config_section_name'.'setting_name'

The result also includes two additional values:
- configuration_file - The path to the configuration file
- worker_pid - The process ID of the running searchd instance

SQL
SHOW SETTINGS;
+--------------------------+-------------------------------------+
| Setting_name             | Value                               |
+--------------------------+-------------------------------------+
| configuration_file       | /etc/manticoresearch/manticore.conf |
| worker_pid               | 658                                 |
| searchd.listen           | 0.0.0:9312                          |
| searchd.listen           | 0.0.0:9306:mysql                    |
| searchd.listen           | 0.0.0:9308:http                     |
| searchd.log              | /var/log/manticore/searchd.log      |
| searchd.query_log        | /var/log/manticore/query.log        |
| searchd.pid_file         | /var/run/manticore/searchd.pid      |
| searchd.data_dir         | /var/lib/manticore                  |
| searchd.query_log_format | sphinxql                            |
| searchd.binlog_path      | /var/lib/manticore/binlog           |
| common.plugin_dir        | /usr/local/lib/manticore            |
| common.lemmatizer_base   | /usr/share/manticore/morph/         |
+--------------------------+-------------------------------------+
13 rows in set (0.00 sec)

SHOW AGENT STATUS

SHOW AGENT ['agent_or_index'] STATUS [ LIKE pattern ]

SHOW AGENT STATUS displays the statistics of remote agents or a distributed table. It includes values such as the age of the last request, last answer, the number of various types of errors and successes, and so on. Statistics are displayed for every agent for the last 1, 5, and 15 intervals, each consisting of ha_period_karma seconds.

SQL
SHOW AGENT STATUS;
+------------------------------------+----------------------------+
| Variable_name                      | Value                      |
+------------------------------------+----------------------------+
| status_period_seconds              | 60                         |
| status_stored_periods              | 15                         |
| ag_0_hostname                      | 192.168.0.202:6713         |
| ag_0_references                    | 2                          |
| ag_0_lastquery                     | 0.41                       |
| ag_0_lastanswer                    | 0.19                       |
| ag_0_lastperiodmsec                | 222                        |
| ag_0_pingtripmsec                  | 10.521                     |
| ag_0_errorsarow                    | 0                          |
| ag_0_1periods_query_timeouts       | 0                          |
| ag_0_1periods_connect_timeouts     | 0                          |
| ag_0_1periods_connect_failures     | 0                          |
| ag_0_1periods_network_errors       | 0                          |
| ag_0_1periods_wrong_replies        | 0                          |
| ag_0_1periods_unexpected_closings  | 0                          |
| ag_0_1periods_warnings             | 0                          |
| ag_0_1periods_succeeded_queries    | 27                         |
| ag_0_1periods_msecsperquery        | 232.31                     |
| ag_0_5periods_query_timeouts       | 0                          |
| ag_0_5periods_connect_timeouts     | 0                          |
| ag_0_5periods_connect_failures     | 0                          |
| ag_0_5periods_network_errors       | 0                          |
| ag_0_5periods_wrong_replies        | 0                          |
| ag_0_5periods_unexpected_closings  | 0                          |
| ag_0_5periods_warnings             | 0                          |
| ag_0_5periods_succeeded_queries    | 146                        |
| ag_0_5periods_msecsperquery        | 231.83                     |
| ag_1_hostname                      | 192.168.0.202:6714         |
| ag_1_references                    | 2                          |
| ag_1_lastquery                     | 0.41                       |
| ag_1_lastanswer                    | 0.19                       |
| ag_1_lastperiodmsec                | 220                        |
| ag_1_pingtripmsec                  | 10.004                     |
| ag_1_errorsarow                    | 0                          |
| ag_1_1periods_query_timeouts       | 0                          |
| ag_1_1periods_connect_timeouts     | 0                          |
| ag_1_1periods_connect_failures     | 0                          |
| ag_1_1periods_network_errors       | 0                          |
| ag_1_1periods_wrong_replies        | 0                          |
| ag_1_1periods_unexpected_closings  | 0                          |
| ag_1_1periods_warnings             | 0                          |
| ag_1_1periods_succeeded_queries    | 27                         |
| ag_1_1periods_msecsperquery        | 231.24                     |
| ag_1_5periods_query_timeouts       | 0                          |
| ag_1_5periods_connect_timeouts     | 0                          |
| ag_1_5periods_connect_failures     | 0                          |
| ag_1_5periods_network_errors       | 0                          |
| ag_1_5periods_wrong_replies        | 0                          |
| ag_1_5periods_unexpected_closings  | 0                          |
| ag_1_5periods_warnings             | 0                          |
| ag_1_5periods_succeeded_queries    | 146                        |
| ag_1_5periods_msecsperquery        | 230.85                     |
+------------------------------------+----------------------------+
50 rows in set (0.01 sec)
PHP
$client->nodes()->agentstatus();
Array(
    [status_period_seconds] => 60
    [status_stored_periods] => 15
    [ag_0_hostname] => 192.168.0.202:6713
    [ag_0_references] => 2
    [ag_0_lastquery] => 0.41
    [ag_0_lastanswer] => 0.19
    [ag_0_lastperiodmsec] => 222  
    [ag_0_errorsarow] => 0
    [ag_0_1periods_query_timeouts] => 0
    [ag_0_1periods_connect_timeouts] => 0
    [ag_0_1periods_connect_failures] => 0
    [ag_0_1periods_network_errors] => 0
    [ag_0_1periods_wrong_replies] => 0
    [ag_0_1periods_unexpected_closings] => 0
    [ag_0_1periods_warnings] => 0
    [ag_0_1periods_succeeded_queries] => 27
    [ag_0_1periods_msecsperquery] => 232.31
    [ag_0_5periods_query_timeouts] => 0
    [ag_0_5periods_connect_timeouts] => 0
    [ag_0_5periods_connect_failures] => 0
    [ag_0_5periods_network_errors] => 0
    [ag_0_5periods_wrong_replies] => 0
    [ag_0_5periods_unexpected_closings] => 0
    [ag_0_5periods_warnings] => 0
    [ag_0_5periods_succeeded_queries] => 146  
    [ag_0_5periods_msecsperquery] => 231.83
    [ag_1_hostname 192.168.0.202:6714
    [ag_1_references] => 2
    [ag_1_lastquery] => 0.41
    [ag_1_lastanswer] => 0.19
    [ag_1_lastperiodmsec] => 220  
    [ag_1_errorsarow] => 0
    [ag_1_1periods_query_timeouts] => 0
    [ag_1_1periods_connect_timeouts] => 0
    [ag_1_1periods_connect_failures] => 0
    [ag_1_1periods_network_errors] => 0
    [ag_1_1periods_wrong_replies] => 0
    [ag_1_1periods_unexpected_closings] => 0
    [ag_1_1periods_warnings] => 0
    [ag_1_1periods_succeeded_queries] => 27
    [ag_1_1periods_msecsperquery] => 231.24
    [ag_1_5periods_query_timeouts] => 0
    [ag_1_5periods_connect_timeouts] => 0
    [ag_1_5periods_connect_failures] => 0
    [ag_1_5periods_network_errors] => 0
    [ag_1_5periods_wrong_replies] => 0
    [ag_1_5periods_unexpected_closings
    [ag_1_5periods_warnings] => 0
    [ag_1_5periods_succeeded_queries] => 146  
    [ag_1_5periods_msecsperquery] => 230.85
)
Python
utilsApi.sql('SHOW AGENT STATUS')
{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
    {u'Key': u'status_period_seconds', u'Value': u'60'},
    {u'Key': u'status_stored_periods', u'Value': u'15'},
    {u'Key': u'ag_0_hostname', u'Value': u'192.168.0.202:6713'},
    {u'Key': u'ag_0_references', u'Value': u'2'},
    {u'Key': u'ag_0_lastquery', u'Value': u'0.41'},
    {u'Key': u'ag_0_lastanswer', u'Value': u'0.19'},
    {u'Key': u'ag_0_lastperiodmsec', u'Value': u'222'},
    {u'Key': u'ag_0_errorsarow', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_query_timeouts', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_connect_timeouts', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_connect_failures', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_network_errors', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_wrong_replies', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_unexpected_closings', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_warnings', u'Value': u'0'},
    {u'Key': u'ag_0_1periods_succeeded_queries', u'Value': u'27'},
    {u'Key': u'ag_0_1periods_msecsperquery', u'Value': u'232.31'},
    {u'Key': u'ag_0_5periods_query_timeouts', u'Value': u'0'},
    {u'Key': u'ag_0_5periods_connect_timeouts', u'Value': u'0'},
    {u'Key': u'ag_0_5periods_connect_failures', u'Value': u'0'},
    {u'Key': u'ag_0_5periods_network_errors', u'Value': u'0'},
    {u'Key': u'ag_0_5periods_wrong_replies', u'Value': u'0'},
    {u'Key': u'ag_0_5periods_unexpected_closings', u'Value': u'0'},
    {u'Key': u'ag_0_5periods_warnings', u'Value': u'0'},
    {u'Key': u'ag_0_5periods_succeeded_queries', u'Value': u'146'},
    {u'Key': u'ag_0_5periods_msecsperquery', u'Value': u'231.83'},
    {u'Key': u'ag_1_hostname 192.168.0.202:6714'},
    {u'Key': u'ag_1_references', u'Value': u'2'},
    {u'Key': u'ag_1_lastquery', u'Value': u'0.41'},
    {u'Key': u'ag_1_lastanswer', u'Value': u'0.19'},
    {u'Key': u'ag_1_lastperiodmsec', u'Value': u'220'},
    {u'Key': u'ag_1_errorsarow', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_query_timeouts', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_connect_timeouts', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_connect_failures', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_network_errors', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_wrong_replies', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_unexpected_closings', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_warnings', u'Value': u'0'},
    {u'Key': u'ag_1_1periods_succeeded_queries', u'Value': u'27'},
    {u'Key': u'ag_1_1periods_msecsperquery', u'Value': u'231.24'},
    {u'Key': u'ag_1_5periods_query_timeouts', u'Value': u'0'},
    {u'Key': u'ag_1_5periods_connect_timeouts', u'Value': u'0'},
    {u'Key': u'ag_1_5periods_connect_failures', u'Value': u'0'},
    {u'Key': u'ag_1_5periods_network_errors', u'Value': u'0'},
    {u'Key': u'ag_1_5periods_wrong_replies', u'Value': u'0'},
    {u'Key': u'ag_1_5periods_warnings', u'Value': u'0'},
    {u'Key': u'ag_1_5periods_succeeded_queries', u'Value': u'146'},
    {u'Key': u'ag_1_5periods_msecsperquery', u'Value': u'230.85'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
javascript
res = await utilsApi.sql("SHOW AGENT STATUS");
{"columns": [{"Key": {"type": "string"}},
              {"Value": {"type": "string"}}],
 "data": [
    {"Key": "status_period_seconds", "Value": "60"},
    {"Key": "status_stored_periods", "Value": "15"},
    {"Key": "ag_0_hostname", "Value": "192.168.0.202:6713"},
    {"Key": "ag_0_references", "Value": "2"},
    {"Key": "ag_0_lastquery", "Value": "0.41"},
    {"Key": "ag_0_lastanswer", "Value": "0.19"},
    {"Key": "ag_0_lastperiodmsec", "Value": "222"},
    {"Key": "ag_0_errorsarow", "Value": "0"},
    {"Key": "ag_0_1periods_query_timeouts", "Value": "0"},
    {"Key": "ag_0_1periods_connect_timeouts", "Value": "0"},
    {"Key": "ag_0_1periods_connect_failures", "Value": "0"},
    {"Key": "ag_0_1periods_network_errors", "Value": "0"},
    {"Key": "ag_0_1periods_wrong_replies", "Value": "0"},
    {"Key": "ag_0_1periods_unexpected_closings", "Value": "0"},
    {"Key": "ag_0_1periods_warnings", "Value": "0"},
    {"Key": "ag_0_1periods_succeeded_queries", "Value": "27"},
    {"Key": "ag_0_1periods_msecsperquery", "Value": "232.31"},
    {"Key": "ag_0_5periods_query_timeouts", "Value": "0"},
    {"Key": "ag_0_5periods_connect_timeouts", "Value": "0"},
    {"Key": "ag_0_5periods_connect_failures", "Value": "0"},
    {"Key": "ag_0_5periods_network_errors", "Value": "0"},
    {"Key": "ag_0_5periods_wrong_replies", "Value": "0"},
    {"Key": "ag_0_5periods_unexpected_closings", "Value": "0"},
    {"Key": "ag_0_5periods_warnings", "Value": "0"},
    {"Key": "ag_0_5periods_succeeded_queries", "Value": "146"},
    {"Key": "ag_0_5periods_msecsperquery", "Value": "231.83"},
    {"Key": "ag_1_hostname 192.168.0.202:6714"},
    {"Key": "ag_1_references", "Value": "2"},
    {"Key": "ag_1_lastquery", "Value": "0.41"},
    {"Key": "ag_1_lastanswer", "Value": "0.19"},
    {"Key": "ag_1_lastperiodmsec", "Value": "220"},
    {"Key": "ag_1_errorsarow", "Value": "0"},
    {"Key": "ag_1_1periods_query_timeouts", "Value": "0"},
    {"Key": "ag_1_1periods_connect_timeouts", "Value": "0"},
    {"Key": "ag_1_1periods_connect_failures", "Value": "0"},
    {"Key": "ag_1_1periods_network_errors", "Value": "0"},
    {"Key": "ag_1_1periods_wrong_replies", "Value": "0"},
    {"Key": "ag_1_1periods_unexpected_closings", "Value": "0"},
    {"Key": "ag_1_1periods_warnings", "Value": "0"},
    {"Key": "ag_1_1periods_succeeded_queries", "Value": "27"},
    {"Key": "ag_1_1periods_msecsperquery", "Value": "231.24"},
    {"Key": "ag_1_5periods_query_timeouts", "Value": "0"},
    {"Key": "ag_1_5periods_connect_timeouts", "Value": "0"},
    {"Key": "ag_1_5periods_connect_failures", "Value": "0"},
    {"Key": "ag_1_5periods_network_errors", "Value": "0"},
    {"Key": "ag_1_5periods_wrong_replies", "Value": "0"},
    {"Key": "ag_1_5periods_warnings", "Value": "0"},
    {"Key": "ag_1_5periods_succeeded_queries", "Value": "146"},
    {"Key": "ag_1_5periods_msecsperquery", "Value": "230.85"}],
 "error": "",
 "total": 0,
 "warning": ""}
Java
utilsApi.sql("SHOW AGENT STATUS");
{columns=[{ Key : { type=string }},
              { Value : { type=string }}],
  data : [
    { Key=status_period_seconds ,  Value=60 },
    { Key=status_stored_periods ,  Value=15 },
    { Key=ag_0_hostname ,  Value=192.168.0.202:6713 },
    { Key=ag_0_references ,  Value=2 },
    { Key=ag_0_lastquery ,  Value=0.41 },
    { Key=ag_0_lastanswer ,  Value=0.19 },
    { Key=ag_0_lastperiodmsec ,  Value=222 },
    { Key=ag_0_errorsarow ,  Value=0 },
    { Key=ag_0_1periods_query_timeouts ,  Value=0 },
    { Key=ag_0_1periods_connect_timeouts ,  Value=0 },
    { Key=ag_0_1periods_connect_failures ,  Value=0 },
    { Key=ag_0_1periods_network_errors ,  Value=0 },
    { Key=ag_0_1periods_wrong_replies ,  Value=0 },
    { Key=ag_0_1periods_unexpected_closings ,  Value=0 },
    { Key=ag_0_1periods_warnings ,  Value=0 },
    { Key=ag_0_1periods_succeeded_queries ,  Value=27 },
    { Key=ag_0_1periods_msecsperquery ,  Value=232.31 },
    { Key=ag_0_5periods_query_timeouts ,  Value=0 },
    { Key=ag_0_5periods_connect_timeouts ,  Value=0 },
    { Key=ag_0_5periods_connect_failures ,  Value=0 },
    { Key=ag_0_5periods_network_errors ,  Value=0 },
    { Key=ag_0_5periods_wrong_replies ,  Value=0 },
    { Key=ag_0_5periods_unexpected_closings ,  Value=0 },
    { Key=ag_0_5periods_warnings ,  Value=0 },
    { Key=ag_0_5periods_succeeded_queries ,  Value=146 },
    { Key=ag_0_5periods_msecsperquery ,  Value=231.83 },
    { Key=ag_1_hostname 192.168.0.202:6714 },
    { Key=ag_1_references ,  Value=2 },
    { Key=ag_1_lastquery ,  Value=0.41 },
    { Key=ag_1_lastanswer ,  Value=0.19 },
    { Key=ag_1_lastperiodmsec ,  Value=220 },
    { Key=ag_1_errorsarow ,  Value=0 },
    { Key=ag_1_1periods_query_timeouts ,  Value=0 },
    { Key=ag_1_1periods_connect_timeouts ,  Value=0 },
    { Key=ag_1_1periods_connect_failures ,  Value=0 },
    { Key=ag_1_1periods_network_errors ,  Value=0 },
    { Key=ag_1_1periods_wrong_replies ,  Value=0 },
    { Key=ag_1_1periods_unexpected_closings ,  Value=0 },
    { Key=ag_1_1periods_warnings ,  Value=0 },
    { Key=ag_1_1periods_succeeded_queries ,  Value=27 },
    { Key=ag_1_1periods_msecsperquery ,  Value=231.24 },
    { Key=ag_1_5periods_query_timeouts ,  Value=0 },
    { Key=ag_1_5periods_connect_timeouts ,  Value=0 },
    { Key=ag_1_5periods_connect_failures ,  Value=0 },
    { Key=ag_1_5periods_network_errors ,  Value=0 },
    { Key=ag_1_5periods_wrong_replies ,  Value=0 },
    { Key=ag_1_5periods_warnings ,  Value=0 },
    { Key=ag_1_5periods_succeeded_queries ,  Value=146 },
    { Key=ag_1_5periods_msecsperquery ,  Value=230.85 }],
  error= ,
  total=0,
  warning= }
C#
utilsApi.Sql("SHOW AGENT STATUS");
{columns=[{ Key : { type=string }},
              { Value : { type=string }}],
  data : [
    { Key=status_period_seconds ,  Value=60 },
    { Key=status_stored_periods ,  Value=15 },
    { Key=ag_0_hostname ,  Value=192.168.0.202:6713 },
    { Key=ag_0_references ,  Value=2 },
    { Key=ag_0_lastquery ,  Value=0.41 },
    { Key=ag_0_lastanswer ,  Value=0.19 },
    { Key=ag_0_lastperiodmsec ,  Value=222 },
    { Key=ag_0_errorsarow ,  Value=0 },
    { Key=ag_0_1periods_query_timeouts ,  Value=0 },
    { Key=ag_0_1periods_connect_timeouts ,  Value=0 },
    { Key=ag_0_1periods_connect_failures ,  Value=0 },
    { Key=ag_0_1periods_network_errors ,  Value=0 },
    { Key=ag_0_1periods_wrong_replies ,  Value=0 },
    { Key=ag_0_1periods_unexpected_closings ,  Value=0 },
    { Key=ag_0_1periods_warnings ,  Value=0 },
    { Key=ag_0_1periods_succeeded_queries ,  Value=27 },
    { Key=ag_0_1periods_msecsperquery ,  Value=232.31 },
    { Key=ag_0_5periods_query_timeouts ,  Value=0 },
    { Key=ag_0_5periods_connect_timeouts ,  Value=0 },
    { Key=ag_0_5periods_connect_failures ,  Value=0 },
    { Key=ag_0_5periods_network_errors ,  Value=0 },
    { Key=ag_0_5periods_wrong_replies ,  Value=0 },
    { Key=ag_0_5periods_unexpected_closings ,  Value=0 },
    { Key=ag_0_5periods_warnings ,  Value=0 },
    { Key=ag_0_5periods_succeeded_queries ,  Value=146 },
    { Key=ag_0_5periods_msecsperquery ,  Value=231.83 },
    { Key=ag_1_hostname 192.168.0.202:6714 },
    { Key=ag_1_references ,  Value=2 },
    { Key=ag_1_lastquery ,  Value=0.41 },
    { Key=ag_1_lastanswer ,  Value=0.19 },
    { Key=ag_1_lastperiodmsec ,  Value=220 },
    { Key=ag_1_errorsarow ,  Value=0 },
    { Key=ag_1_1periods_query_timeouts ,  Value=0 },
    { Key=ag_1_1periods_connect_timeouts ,  Value=0 },
    { Key=ag_1_1periods_connect_failures ,  Value=0 },
    { Key=ag_1_1periods_network_errors ,  Value=0 },
    { Key=ag_1_1periods_wrong_replies ,  Value=0 },
    { Key=ag_1_1periods_unexpected_closings ,  Value=0 },
    { Key=ag_1_1periods_warnings ,  Value=0 },
    { Key=ag_1_1periods_succeeded_queries ,  Value=27 },
    { Key=ag_1_1periods_msecsperquery ,  Value=231.24 },
    { Key=ag_1_5periods_query_timeouts ,  Value=0 },
    { Key=ag_1_5periods_connect_timeouts ,  Value=0 },
    { Key=ag_1_5periods_connect_failures ,  Value=0 },
    { Key=ag_1_5periods_network_errors ,  Value=0 },
    { Key=ag_1_5periods_wrong_replies ,  Value=0 },
    { Key=ag_1_5periods_warnings ,  Value=0 },
    { Key=ag_1_5periods_succeeded_queries ,  Value=146 },
    { Key=ag_1_5periods_msecsperquery ,  Value=230.85 }],
  error="" ,
  total=0,
  warning="" }
TypeScript
res = await utilsApi.sql("SHOW AGENT STATUS");
{
    "columns": 
    [{
        "Key": 
        {
            "type": "string"
        }
    },
    {
        "Value":
        {
            "type": "string"
        }
    }],
    "data": 
    [
        {"Key": "status_period_seconds", "Value": "60"},
        {"Key": "status_stored_periods", "Value": "15"},
        {"Key": "ag_0_hostname", "Value": "192.168.0.202:6713"},
        {"Key": "ag_0_references", "Value": "2"},
        {"Key": "ag_0_lastquery", "Value": "0.41"},
        {"Key": "ag_0_lastanswer", "Value": "0.19"},
        {"Key": "ag_0_lastperiodmsec", "Value": "222"},
        {"Key": "ag_0_errorsarow", "Value": "0"},
        {"Key": "ag_0_1periods_query_timeouts", "Value": "0"},
        {"Key": "ag_0_1periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_0_1periods_connect_failures", "Value": "0"},
        {"Key": "ag_0_1periods_network_errors", "Value": "0"},
        {"Key": "ag_0_1periods_wrong_replies", "Value": "0"},
        {"Key": "ag_0_1periods_unexpected_closings", "Value": "0"},
        {"Key": "ag_0_1periods_warnings", "Value": "0"},
        {"Key": "ag_0_1periods_succeeded_queries", "Value": "27"},
        {"Key": "ag_0_1periods_msecsperquery", "Value": "232.31"},
        {"Key": "ag_0_5periods_query_timeouts", "Value": "0"},
        {"Key": "ag_0_5periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_0_5periods_connect_failures", "Value": "0"},
        {"Key": "ag_0_5periods_network_errors", "Value": "0"},
        {"Key": "ag_0_5periods_wrong_replies", "Value": "0"},
        {"Key": "ag_0_5periods_unexpected_closings", "Value": "0"},
        {"Key": "ag_0_5periods_warnings", "Value": "0"},
        {"Key": "ag_0_5periods_succeeded_queries", "Value": "146"},
        {"Key": "ag_0_5periods_msecsperquery", "Value": "231.83"},
        {"Key": "ag_1_hostname 192.168.0.202:6714"},
        {"Key": "ag_1_references", "Value": "2"},
        {"Key": "ag_1_lastquery", "Value": "0.41"},
        {"Key": "ag_1_lastanswer", "Value": "0.19"},
        {"Key": "ag_1_lastperiodmsec", "Value": "220"},
        {"Key": "ag_1_errorsarow", "Value": "0"},
        {"Key": "ag_1_1periods_query_timeouts", "Value": "0"},
        {"Key": "ag_1_1periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_1_1periods_connect_failures", "Value": "0"},
        {"Key": "ag_1_1periods_network_errors", "Value": "0"},
        {"Key": "ag_1_1periods_wrong_replies", "Value": "0"},
        {"Key": "ag_1_1periods_unexpected_closings", "Value": "0"},
        {"Key": "ag_1_1periods_warnings", "Value": "0"},
        {"Key": "ag_1_1periods_succeeded_queries", "Value": "27"},
        {"Key": "ag_1_1periods_msecsperquery", "Value": "231.24"},
        {"Key": "ag_1_5periods_query_timeouts", "Value": "0"},
        {"Key": "ag_1_5periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_1_5periods_connect_failures", "Value": "0"},
        {"Key": "ag_1_5periods_network_errors", "Value": "0"},
        {"Key": "ag_1_5periods_wrong_replies", "Value": "0"},
        {"Key": "ag_1_5periods_warnings", "Value": "0"},
        {"Key": "ag_1_5periods_succeeded_queries", "Value": "146"},
        {"Key": "ag_1_5periods_msecsperquery", "Value": "230.85"}
    ],
    "error": "",
    "total": 0,
    "warning": ""
 }
Go
res := apiClient.UtilsAPI.Sql(context.Background()).Body("SHOW AGENT STATUS").Execute()
{
    "columns": 
    [{
        "Key": 
        {
            "type": "string"
        }
    },
    {
        "Value":
        {
            "type": "string"
        }
    }],
    "data": 
    [
        {"Key": "status_period_seconds", "Value": "60"},
        {"Key": "status_stored_periods", "Value": "15"},
        {"Key": "ag_0_hostname", "Value": "192.168.0.202:6713"},
        {"Key": "ag_0_references", "Value": "2"},
        {"Key": "ag_0_lastquery", "Value": "0.41"},
        {"Key": "ag_0_lastanswer", "Value": "0.19"},
        {"Key": "ag_0_lastperiodmsec", "Value": "222"},
        {"Key": "ag_0_errorsarow", "Value": "0"},
        {"Key": "ag_0_1periods_query_timeouts", "Value": "0"},
        {"Key": "ag_0_1periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_0_1periods_connect_failures", "Value": "0"},
        {"Key": "ag_0_1periods_network_errors", "Value": "0"},
        {"Key": "ag_0_1periods_wrong_replies", "Value": "0"},
        {"Key": "ag_0_1periods_unexpected_closings", "Value": "0"},
        {"Key": "ag_0_1periods_warnings", "Value": "0"},
        {"Key": "ag_0_1periods_succeeded_queries", "Value": "27"},
        {"Key": "ag_0_1periods_msecsperquery", "Value": "232.31"},
        {"Key": "ag_0_5periods_query_timeouts", "Value": "0"},
        {"Key": "ag_0_5periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_0_5periods_connect_failures", "Value": "0"},
        {"Key": "ag_0_5periods_network_errors", "Value": "0"},
        {"Key": "ag_0_5periods_wrong_replies", "Value": "0"},
        {"Key": "ag_0_5periods_unexpected_closings", "Value": "0"},
        {"Key": "ag_0_5periods_warnings", "Value": "0"},
        {"Key": "ag_0_5periods_succeeded_queries", "Value": "146"},
        {"Key": "ag_0_5periods_msecsperquery", "Value": "231.83"},
        {"Key": "ag_1_hostname 192.168.0.202:6714"},
        {"Key": "ag_1_references", "Value": "2"},
        {"Key": "ag_1_lastquery", "Value": "0.41"},
        {"Key": "ag_1_lastanswer", "Value": "0.19"},
        {"Key": "ag_1_lastperiodmsec", "Value": "220"},
        {"Key": "ag_1_errorsarow", "Value": "0"},
        {"Key": "ag_1_1periods_query_timeouts", "Value": "0"},
        {"Key": "ag_1_1periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_1_1periods_connect_failures", "Value": "0"},
        {"Key": "ag_1_1periods_network_errors", "Value": "0"},
        {"Key": "ag_1_1periods_wrong_replies", "Value": "0"},
        {"Key": "ag_1_1periods_unexpected_closings", "Value": "0"},
        {"Key": "ag_1_1periods_warnings", "Value": "0"},
        {"Key": "ag_1_1periods_succeeded_queries", "Value": "27"},
        {"Key": "ag_1_1periods_msecsperquery", "Value": "231.24"},
        {"Key": "ag_1_5periods_query_timeouts", "Value": "0"},
        {"Key": "ag_1_5periods_connect_timeouts", "Value": "0"},
        {"Key": "ag_1_5periods_connect_failures", "Value": "0"},
        {"Key": "ag_1_5periods_network_errors", "Value": "0"},
        {"Key": "ag_1_5periods_wrong_replies", "Value": "0"},
        {"Key": "ag_1_5periods_warnings", "Value": "0"},
        {"Key": "ag_1_5periods_succeeded_queries", "Value": "146"},
        {"Key": "ag_1_5periods_msecsperquery", "Value": "230.85"}
    ],
    "error": "",
    "total": 0,
    "warning": ""
 }

An optional LIKE clause is supported, with the syntax being the same as in SHOW STATUS.

SQL
SHOW AGENT STATUS LIKE '%5period%msec%';
+-----------------------------+--------+
| Key                         | Value  |
+-----------------------------+--------+
| ag_0_5periods_msecsperquery | 234.72 |
| ag_1_5periods_msecsperquery | 233.73 |
| ag_2_5periods_msecsperquery | 343.81 |
+-----------------------------+--------+
3 rows in set (0.00 sec)
PHP
$client->nodes()->agentstatus(
    ['body'=>
        ['pattern'=>'%5period%msec%']
    ]
);
Array(
    [ag_0_5periods_msecsperquery] => 234.72
    [ag_1_5periods_msecsperquery] => 233.73
    [ag_2_5periods_msecsperquery] => 343.81
)
Python
utilsApi.sql('SHOW AGENT STATUS LIKE \'%5period%msec%\'')
{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
    {u'Key': u'ag_0_5periods_msecsperquery', u'Value': u'234.72'},
    {u'Key': u'ag_1_5periods_msecsperquery', u'Value': u'233.73'},
    {u'Key': u'ag_2_5periods_msecsperquery', u'Value': u'343.81'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
javascript
res = await utilsApi.sql("SHOW AGENT STATUS LIKE \"%5period%msec%\"");
{"columns": [{"Key": {"type": "string"}},
              {"Value": {"type": "string"}}],
 "data": [
    {"Key": "ag_0_5periods_msecsperquery", "Value": "234.72"},
    {"Key": "ag_1_5periods_msecsperquery", "Value": "233.73"},
    {"Key": "ag_2_5periods_msecsperquery", "Value": "343.81"}],
 "error": "",
 "total": 0,
 "warning": ""}
Java
utilsApi.sql("SHOW AGENT STATUS LIKE \"%5period%msec%\"");
{columns: [{Key={type=string}},
              {Value={type=string}}],
 data: [
    {Key=ag_0_5periods_msecsperquery, Value=234.72},
    {Key=ag_1_5periods_msecsperquery, Value=233.73},
    {Key=ag_2_5periods_msecsperquery, Value=343.81}],
 error: ,
 total: 0,
 warning: }
C#
utilsApi.Sql("SHOW AGENT STATUS LIKE \"%5period%msec%\"");
{columns: [{Key={type=string}},
              {Value={type=string}}],
 data: [
    {Key=ag_0_5periods_msecsperquery, Value=234.72},
    {Key=ag_1_5periods_msecsperquery, Value=233.73},
    {Key=ag_2_5periods_msecsperquery, Value=343.81}],
 error: "",
 total: 0,
 warning: ""}
TypeScript
res = await utilsApi.sql("SHOW AGENT STATUS LIKE \"%5period%msec%\"");
{
    "columns": 
    [{
        "Key": {"type": "string"}
    },
    {
        "Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "ag_0_5periods_msecsperquery", "Value": "234.72"},
        {"Key": "ag_1_5periods_msecsperquery", "Value": "233.73"},
        {"Key": "ag_2_5periods_msecsperquery", "Value": "343.81"}
    ],
    "error": "",
    "total": 0,
    "warning": ""
}
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("SHOW AGENT STATUS LIKE \"%5period%msec%\"").Execute()
{
    "columns": 
    [{
        "Key": {"type": "string"}
    },
    {
        "Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "ag_0_5periods_msecsperquery", "Value": "234.72"},
        {"Key": "ag_1_5periods_msecsperquery", "Value": "233.73"},
        {"Key": "ag_2_5periods_msecsperquery", "Value": "343.81"}
    ],
    "error": "",
    "total": 0,
    "warning": ""
}

You can specify a particular agent by its address. In this case, only that agent's data will be displayed. Additionally, the agent_ prefix will be used instead of ag_N_:

SQL
SHOW AGENT '192.168.0.202:6714' STATUS LIKE '%15periods%';
+-------------------------------------+--------+
| Variable_name                       | Value  |
+-------------------------------------+--------+
| agent_15periods_query_timeouts      | 0      |
| agent_15periods_connect_timeouts    | 0      |
| agent_15periods_connect_failures    | 0      |
| agent_15periods_network_errors      | 0      |
| agent_15periods_wrong_replies       | 0      |
| agent_15periods_unexpected_closings | 0      |
| agent_15periods_warnings            | 0      |
| agent_15periods_succeeded_queries   | 439    |
| agent_15periods_msecsperquery       | 231.73 |
+-------------------------------------+--------+
9 rows in set (0.00 sec)
PHP
$client->nodes()->agentstatus(
    ['body'=>
        ['agent'=>'192.168.0.202:6714'],
        ['pattern'=>'%5period%msec%']
    ]
);
Array(
    [agent_15periods_query_timeouts] => 0
    [agent_15periods_connect_timeouts] => 0
    [agent_15periods_connect_failures] => 0
    [agent_15periods_network_errors] => 0
    [agent_15periods_wrong_replies] => 0
    [agent_15periods_unexpected_closings] => 0
    [agent_15periods_warnings] => 0
    [agent_15periods_succeeded_queries] => 439
    [agent_15periods_msecsperquery] => 231.73
)
Python
utilsApi.sql('SHOW AGENT \'192.168.0.202:6714\' STATUS LIKE \'%15periods%\'')
{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
    {u'Key': u'agent_15periods_query_timeouts', u'Value': u'0'},
    {u'Key': u'agent_15periods_connect_timeouts', u'Value': u'0'},
    {u'Key': u'agent_15periods_connect_failures', u'Value': u'0'},
    {u'Key': u'agent_15periods_network_errors', u'Value': u'0'},
    {u'Key': u'agent_15periods_connect_failures', u'Value': u'0'},
    {u'Key': u'agent_15periods_wrong_replies', u'Value': u'0'},
    {u'Key': u'agent_15periods_unexpected_closings', u'Value': u'0'},
    {u'Key': u'agent_15periods_warnings', u'Value': u'0'},
    {u'Key': u'agent_15periods_succeeded_queries', u'Value': u'439'},
    {u'Key': u'agent_15periods_msecsperquery', u'Value': u'233.73'},
    ],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
javascript
res = await utilsApi.sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{"columns": [{"Key": {"type": "string"}},
              {"Value": {"type": "string"}}],
 "data": [
    {"Key": "agent_15periods_query_timeouts", "Value": "0"},
    {"Key": "agent_15periods_connect_timeouts", "Value": "0"},
    {"Key": "agent_15periods_connect_failures", "Value": "0"},
    {"Key": "agent_15periods_network_errors", "Value": "0"},
    {"Key": "agent_15periods_connect_failures", "Value": "0"},
    {"Key": "agent_15periods_wrong_replies", "Value": "0"},
    {"Key": "agent_15periods_unexpected_closings", "Value": "0"},
    {"Key": "agent_15periods_warnings", "Value": "0"},
    {"Key": "agent_15periods_succeeded_queries", "Value": "439"},
    {"Key": "agent_15periods_msecsperquery", "Value": "233.73"},
    ],
 "error": "",
 "total": 0,
 "warning": ""}
Java
utilsApi.sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{columns=[{Key={type=string}},
              {Value={type=string}}],
 data=[
    {Key=agent_15periods_query_timeouts, Value=0},
    {Key=agent_15periods_connect_timeouts, Value=0},
    {Key=agent_15periods_connect_failures, Value=0},
    {Key=agent_15periods_network_errors, Value=0},
    {Key=agent_15periods_connect_failures, Value=0},
    {Key=agent_15periods_wrong_replies, Value=0},
    {Key=agent_15periods_unexpected_closings, Value=0},
    {Key=agent_15periods_warnings, Value=0},
    {Key=agent_15periods_succeeded_queries, Value=439},
    {Key=agent_15periods_msecsperquery, Value=233.73},
    ],
 error=,
 total=0,
 warning=}
C#
utilsApi.Sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{columns=[{Key={type=string}},
              {Value={type=string}}],
 data=[
    {Key=agent_15periods_query_timeouts, Value=0},
    {Key=agent_15periods_connect_timeouts, Value=0},
    {Key=agent_15periods_connect_failures, Value=0},
    {Key=agent_15periods_network_errors, Value=0},
    {Key=agent_15periods_connect_failures, Value=0},
    {Key=agent_15periods_wrong_replies, Value=0},
    {Key=agent_15periods_unexpected_closings, Value=0},
    {Key=agent_15periods_warnings, Value=0},
    {Key=agent_15periods_succeeded_queries, Value=439},
    {Key=agent_15periods_msecsperquery, Value=233.73},
    ],
 error="",
 total=0,
 warning=""}
TypeScript
res = await utilsApi.sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{
    "columns": 
    [{
        {"Key": {"type": "string"}
    },
        {"Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "agent_15periods_query_timeouts", "Value": "0"},
        {"Key": "agent_15periods_connect_timeouts", "Value": "0"},
        {"Key": "agent_15periods_connect_failures", "Value": "0"},
        {"Key": "agent_15periods_network_errors", "Value": "0"},
        {"Key": "agent_15periods_connect_failures", "Value": "0"},
        {"Key": "agent_15periods_wrong_replies", "Value": "0"},
        {"Key": "agent_15periods_unexpected_closings", "Value": "0"},
        {"Key": "agent_15periods_warnings", "Value": "0"},
        {"Key": "agent_15periods_succeeded_queries", "Value": "439"},
        {"Key": "agent_15periods_msecsperquery", "Value": "233.73"},
    ],
    "error": "",
    "total": 0,
    "warning": ""
}
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\").Execute()
{
    "columns": 
    [{
        {"Key": {"type": "string"}
    },
        {"Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "agent_15periods_query_timeouts", "Value": "0"},
        {"Key": "agent_15periods_connect_timeouts", "Value": "0"},
        {"Key": "agent_15periods_connect_failures", "Value": "0"},
        {"Key": "agent_15periods_network_errors", "Value": "0"},
        {"Key": "agent_15periods_connect_failures", "Value": "0"},
        {"Key": "agent_15periods_wrong_replies", "Value": "0"},
        {"Key": "agent_15periods_unexpected_closings", "Value": "0"},
        {"Key": "agent_15periods_warnings", "Value": "0"},
        {"Key": "agent_15periods_succeeded_queries", "Value": "439"},
        {"Key": "agent_15periods_msecsperquery", "Value": "233.73"},
    ],
    "error": "",
    "total": 0,
    "warning": ""
}

Finally, you can check the status of the agents in a specific distributed table using the SHOW AGENT index_name STATUS statement. This statement displays the table's HA status (i.e., whether or not it uses agent mirrors at all) and provides information on the mirrors, including: address, blackhole and persistent flags, and the mirror selection probability used when one of the weighted probability strategies is in effect.

SQL
SHOW AGENT dist_index STATUS;
+--------------------------------------+--------------------------------+
| Variable_name                        | Value                          |
+--------------------------------------+--------------------------------+
| dstindex_1_is_ha                     | 1                              |
| dstindex_1mirror1_id                 | 192.168.0.202:6713:loc         |
| dstindex_1mirror1_probability_weight | 0.372864                       |
| dstindex_1mirror1_is_blackhole       | 0                              |
| dstindex_1mirror1_is_persistent      | 0                              |
| dstindex_1mirror2_id                 | 192.168.0.202:6714:loc         |
| dstindex_1mirror2_probability_weight | 0.374635                       |
| dstindex_1mirror2_is_blackhole       | 0                              |
| dstindex_1mirror2_is_persistent      | 0                              |
| dstindex_1mirror3_id                 | dev1.manticoresearch.com:6714:loc |
| dstindex_1mirror3_probability_weight | 0.252501                       |
| dstindex_1mirror3_is_blackhole       | 0                              |
| dstindex_1mirror3_is_persistent      | 0                              |
+--------------------------------------+--------------------------------+
13 rows in set (0.00 sec)
PHP
$client->nodes()->agentstatus(
    ['body'=>
        ['agent'=>'dist_index']
    ]
);
Array(
    [dstindex_1_is_ha] => 1
    [dstindex_1mirror1_id] => 192.168.0.202:6713:loc
    [dstindex_1mirror1_probability_weight] => 0.372864
    [dstindex_1mirror1_is_blackhole] => 0
    [dstindex_1mirror1_is_persistent] => 0
    [dstindex_1mirror2_id] => 192.168.0.202:6714:loc
    [dstindex_1mirror2_probability_weight] => 0.374635
    [dstindex_1mirror2_is_blackhole] => 0
    [dstindex_1mirror2_is_persistent] => 0
    [dstindex_1mirror3_id] => dev1.manticoresearch.com:6714:loc
    [dstindex_1mirror3_probability_weight] => 0.252501
    [dstindex_1mirror3_is_blackhole] => 0
    [dstindex_1mirror3_is_persistent] => 0
)
Python
utilsApi.sql('SHOW AGENT \'192.168.0.202:6714\' STATUS LIKE \'%15periods%\'')
{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
    {u'Key': u'dstindex_1_is_ha', u'Value': u'1'},
    {u'Key': u'dstindex_1mirror1_id', u'Value': u'192.168.0.202:6713:loc'},
    {u'Key': u'dstindex_1mirror1_probability_weight', u'Value': u'0.372864'},
    {u'Key': u'dstindex_1mirror1_is_blackhole', u'Value': u'0'},
    {u'Key': u'dstindex_1mirror1_is_persistent', u'Value': u'0'},
    {u'Key': u'dstindex_1mirror2_id', u'Value': u'192.168.0.202:6714:loc'},
    {u'Key': u'dstindex_1mirror2_probability_weight', u'Value': u'0.374635'},
    {u'Key': u'dstindex_1mirror2_is_blackhole', u'Value': u'0'},
    {u'Key': u'dstindex_1mirror2_is_persistent', u'Value': u'439'},
    {u'Key': u'dstindex_1mirror3_id', u'Value': u'dev1.manticoresearch.com:6714:loc'},
    {u'Key': u'dstindex_1mirror3_probability_weight', u'Value': u' 0.252501'},
    {u'Key': u'dstindex_1mirror3_is_blackhole', u'Value': u'0'},
    {u'Key': u'dstindex_1mirror3_is_persistent', u'Value': u'439'}    
    ],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
javascript
res = await utilsApi.sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{"columns": [{"Key": {"type": "string"}},
              {"Value": {"type": "string"}}],
 "data": [
    {"Key": "dstindex_1_is_ha", "Value": "1"},
    {"Key": "dstindex_1mirror1_id", "Value": "192.168.0.202:6713:loc"},
    {"Key": "dstindex_1mirror1_probability_weight", "Value": "0.372864"},
    {"Key": "dstindex_1mirror1_is_blackhole", "Value": "0"},
    {"Key": "dstindex_1mirror1_is_persistent", "Value": "0"},
    {"Key": "dstindex_1mirror2_id", "Value": "192.168.0.202:6714:loc"},
    {"Key": "dstindex_1mirror2_probability_weight", "Value": "0.374635"},
    {"Key": "dstindex_1mirror2_is_blackhole", "Value": "0"},
    {"Key": "dstindex_1mirror2_is_persistent", "Value": "439"},
    {"Key": "dstindex_1mirror3_id", "Value": "dev1.manticoresearch.com:6714:loc"},
    {"Key": "dstindex_1mirror3_probability_weight", "Value": " 0.252501"},
    {"Key": "dstindex_1mirror3_is_blackhole", "Value": "0"},
    {"Key": "dstindex_1mirror3_is_persistent", "Value": "439"}    
    ],
 "error": "",
 "total": 0,
 "warning": ""}
Java
utilsApi.sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{columns=[{Key={type=string}},
              {Value={type=string}}],
 data=[
    {Key=dstindex_1_is_ha, Value=1},
    {Key=dstindex_1mirror1_id, Value=192.168.0.202:6713:loc},
    {Key=dstindex_1mirror1_probability_weight, Value=0.372864},
    {Key=dstindex_1mirror1_is_blackhole, Value=0},
    {Key=dstindex_1mirror1_is_persistent, Value=0},
    {Key=dstindex_1mirror2_id, Value=192.168.0.202:6714:loc},
    {Key=dstindex_1mirror2_probability_weight, Value=0.374635},
    {Key=dstindex_1mirror2_is_blackhole, Value=0},
    {Key=dstindex_1mirror2_is_persistent, Value=439},
    {Key=dstindex_1mirror3_id, Value=dev1.manticoresearch.com:6714:loc},
    {Key=dstindex_1mirror3_probability_weight, Value= 0.252501},
    {Key=dstindex_1mirror3_is_blackhole, Value=0},
    {Key=dstindex_1mirror3_is_persistent, Value=439}    
    ],
 error=,
 total=0,
 warning=}
C#
utilsApi.Sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{columns=[{Key={type=string}},
              {Value={type=string}}],
 data=[
    {Key=dstindex_1_is_ha, Value=1},
    {Key=dstindex_1mirror1_id, Value=192.168.0.202:6713:loc},
    {Key=dstindex_1mirror1_probability_weight, Value=0.372864},
    {Key=dstindex_1mirror1_is_blackhole, Value=0},
    {Key=dstindex_1mirror1_is_persistent, Value=0},
    {Key=dstindex_1mirror2_id, Value=192.168.0.202:6714:loc},
    {Key=dstindex_1mirror2_probability_weight, Value=0.374635},
    {Key=dstindex_1mirror2_is_blackhole, Value=0},
    {Key=dstindex_1mirror2_is_persistent, Value=439},
    {Key=dstindex_1mirror3_id, Value=dev1.manticoresearch.com:6714:loc},
    {Key=dstindex_1mirror3_probability_weight, Value= 0.252501},
    {Key=dstindex_1mirror3_is_blackhole, Value=0},
    {Key=dstindex_1mirror3_is_persistent, Value=439}    
    ],
 error="",
 total=0,
 warning=""}
TypeScript
res = await utilsApi.sql("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"");
{
    "columns": 
    [{
        "Key": {"type": "string"}},
        {"Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "dstindex_1_is_ha", "Value": "1"},
        {"Key": "dstindex_1mirror1_id", "Value": "192.168.0.202:6713:loc"},
        {"Key": "dstindex_1mirror1_probability_weight", "Value": "0.372864"},
        {"Key": "dstindex_1mirror1_is_blackhole", "Value": "0"},
        {"Key": "dstindex_1mirror1_is_persistent", "Value": "0"},
        {"Key": "dstindex_1mirror2_id", "Value": "192.168.0.202:6714:loc"},
        {"Key": "dstindex_1mirror2_probability_weight", "Value": "0.374635"},
        {"Key": "dstindex_1mirror2_is_blackhole", "Value": "0"},
        {"Key": "dstindex_1mirror2_is_persistent", "Value": "439"},
        {"Key": "dstindex_1mirror3_id", "Value": "dev1.manticoresearch.com:6714:loc"},
        {"Key": "dstindex_1mirror3_probability_weight", "Value": " 0.252501"},
        {"Key": "dstindex_1mirror3_is_blackhole", "Value": "0"},
        {"Key": "dstindex_1mirror3_is_persistent", "Value": "439"}    
    ],
    "error": "",
    "total": 0,
    "warning": ""
}
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("SHOW AGENT \"192.168.0.202:6714\" STATUS LIKE \"%15periods%\"").Execute()
{
    "columns": 
    [{
        "Key": {"type": "string"}},
        {"Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "dstindex_1_is_ha", "Value": "1"},
        {"Key": "dstindex_1mirror1_id", "Value": "192.168.0.202:6713:loc"},
        {"Key": "dstindex_1mirror1_probability_weight", "Value": "0.372864"},
        {"Key": "dstindex_1mirror1_is_blackhole", "Value": "0"},
        {"Key": "dstindex_1mirror1_is_persistent", "Value": "0"},
        {"Key": "dstindex_1mirror2_id", "Value": "192.168.0.202:6714:loc"},
        {"Key": "dstindex_1mirror2_probability_weight", "Value": "0.374635"},
        {"Key": "dstindex_1mirror2_is_blackhole", "Value": "0"},
        {"Key": "dstindex_1mirror2_is_persistent", "Value": "439"},
        {"Key": "dstindex_1mirror3_id", "Value": "dev1.manticoresearch.com:6714:loc"},
        {"Key": "dstindex_1mirror3_probability_weight", "Value": " 0.252501"},
        {"Key": "dstindex_1mirror3_is_blackhole", "Value": "0"},
        {"Key": "dstindex_1mirror3_is_persistent", "Value": "439"}    
    ],
    "error": "",
    "total": 0,
    "warning": ""
}
¶ 19.2

SHOW META

SHOW META [ LIKE pattern ]

SHOW META is an SQL statement that displays additional meta-information about the processed query, including the query time, keyword statistics, and information about the secondary indexes used. The syntax is:

The included items are:

SQL
SELECT id, story_author FROM hn_small WHERE MATCH('one|two|three') and comment_ranking > 2 limit 5;
show meta;
+---------+--------------+
| id      | story_author |
+---------+--------------+
|  151171 | anewkid      |
|  302758 | bks          |
|  805806 | drRoflol     |
| 1099245 | tnorthcutt   |
|  303252 | whiten       |
+---------+--------------+
5 rows in set (0.00 sec)

+----------------+---------------------------------------+
| Variable_name  | Value                                 |
+----------------+---------------------------------------+
| total          | 5                                     |
| total_found    | 2308                                  |
| total_relation | eq                                    |
| time           | 0.001                                 |
| keyword[0]     | one                                   |
| docs[0]        | 224387                                |
| hits[0]        | 310327                                |
| keyword[1]     | three                                 |
| docs[1]        | 18181                                 |
| hits[1]        | 21102                                 |
| keyword[2]     | two                                   |
| docs[2]        | 63251                                 |
| hits[2]        | 75961                                 |
| index          | comment_ranking:SecondaryIndex (100%) |
+----------------+---------------------------------------+
14 rows in set (0.00 sec)

SHOW META can display I/O and CPU counters, but they will only be available if searchd was started with the --iostats and --cpustats switches, respectively.

SQL
SELECT id,channel_id FROM records WHERE MATCH('one|two|three') limit 5;

SHOW META;
+--------+--------------+
| id     | story_author |
+--------+--------------+
| 300263 | throwaway37  |
| 713503 | mahmud       |
| 716804 | mahmud       |
| 776906 | jimbokun     |
| 753332 | foxhop       |
+--------+--------------+
5 rows in set (0.01 sec)

+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| total                 | 5      |
| total_found           | 266385 |
| total_relation        | eq     |
| time                  | 0.011  |
| cpu_time              | 18.004 |
| agents_cpu_time       | 0.000  |
| io_read_time          | 0.000  |
| io_read_ops           | 0      |
| io_read_kbytes        | 0.0    |
| io_write_time         | 0.000  |
| io_write_ops          | 0      |
| io_write_kbytes       | 0.0    |
| agent_io_read_time    | 0.000  |
| agent_io_read_ops     | 0      |
| agent_io_read_kbytes  | 0.0    |
| agent_io_write_time   | 0.000  |
| agent_io_write_ops    | 0      |
| agent_io_write_kbytes | 0.0    |
| keyword[0]            | one    |
| docs[0]               | 224387 |
| hits[0]               | 310327 |
| keyword[1]            | three  |
| docs[1]               | 18181  |
| hits[1]               | 21102  |
| keyword[2]            | two    |
| docs[2]               | 63251  |
| hits[2]               | 75961  |
+-----------------------+--------+
27 rows in set (0.00 sec)

Additional values, such as predicted_time, dist_predicted_time, local_fetched_docs, local_fetched_hits, local_fetched_skips, and their respective dist_fetched_* counterparts, will only be available if searchd was configured with predicted time costs and the query included predicted_time in the OPTION clause.

SQL
SELECT id,story_author FROM hn_small WHERE MATCH('one|two|three') limit 5 option max_predicted_time=100;

SHOW META;
+--------+--------------+
| id     | story_author |
+--------+--------------+
| 300263 | throwaway37  |
| 713503 | mahmud       |
| 716804 | mahmud       |
| 776906 | jimbokun     |
| 753332 | foxhop       |
+--------+--------------+
5 rows in set (0.01 sec)

mysql> show meta;
+---------------------+--------+
| Variable_name       | Value  |
+---------------------+--------+
| total               | 5      |
| total_found         | 266385 |
| total_relation      | eq     |
| time                | 0.012  |
| local_fetched_docs  | 307212 |
| local_fetched_hits  | 407390 |
| local_fetched_skips | 24     |
| predicted_time      | 56     |
| keyword[0]          | one    |
| docs[0]             | 224387 |
| hits[0]             | 310327 |
| keyword[1]          | three  |
| docs[1]             | 18181  |
| hits[1]             | 21102  |
| keyword[2]          | two    |
| docs[2]             | 63251  |
| hits[2]             | 75961  |
+---------------------+--------+
17 rows in set (0.00 sec)

SHOW META must be executed immediately after the query in the same session. Since some MySQL connectors/libraries use connection pools, running SHOW META in a separate statement can lead to unexpected results, such as retrieving metadata from another query. In these cases (and generally recommended), run a multiple statement containing both the query and SHOW META. Some connectors/libraries support multi-queries within the same method for a single statement, while others may require the use of a dedicated method for multi-queries or setting specific options during connection setup.

SQL
SELECT id,story_author FROM hn_small WHERE MATCH('one|two|three') LIMIT 5; SHOW META;
+--------+--------------+
| id     | story_author |
+--------+--------------+
| 300263 | throwaway37  |
| 713503 | mahmud       |
| 716804 | mahmud       |
| 776906 | jimbokun     |
| 753332 | foxhop       |
+--------+--------------+
5 rows in set (0.01 sec)

+----------------+--------+
| Variable_name  | Value  |
+----------------+--------+
| total          | 5      |
| total_found    | 266385 |
| total_relation | eq     |
| time           | 0.011  |
| keyword[0]     | one    |
| docs[0]        | 224387 |
| hits[0]        | 310327 |
| keyword[1]     | three  |
| docs[1]        | 18181  |
| hits[1]        | 21102  |
| keyword[2]     | two    |
| docs[2]        | 63251  |
| hits[2]        | 75961  |
+----------------+--------+
13 rows in set (0.00 sec)

You can also use the optional LIKE clause, which allows you to select only the variables that match a specific pattern. The pattern syntax follows standard SQL wildcards, where % represents any number of any characters, and _ represents a single character.

SQL
SHOW META LIKE 'total%';
+----------------+--------+
| Variable_name  | Value  |
+----------------+--------+
| total          | 5      |
| total_found    | 266385 |
| total_relation | eq     |
+----------------+--------+
3 rows in set (0.00 sec)

SHOW META and facets

When utilizing faceted search, you can examine the multiplier field in the SHOW META output to determine how many queries were executed in an optimized group.

SQL
SELECT * FROM facetdemo FACET brand_id FACET price FACET categories;
SHOW META LIKE 'multiplier';
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
| id   | price | brand_id | title               | brand_name  | property    | j                                     | categories |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
|    1 |   306 |        1 | Product Ten Three   | Brand One   | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |
...

+----------+----------+
| brand_id | count(*) |
+----------+----------+
|        1 |     1013 |
...

+-------+----------+
| price | count(*) |
+-------+----------+
|   306 |        7 |
...

+------------+----------+
| categories | count(*) |
+------------+----------+
|         10 |     2436 |
...

+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| multiplier    | 4     |
+---------------+-------+
1 row in set (0.00 sec)

SHOW META and query optimizer

When the cost-based query optimizer chooses to use DocidIndex, ColumnarScan, or SecondaryIndex instead of a plain filter, this is reflected in the SHOW META command.

The index variable displays the names and types of secondary indexes used during query execution. The percentage indicates how many disk chunks (in the case of an RT table) or pseudo shards (in the case of a plain table) utilized the secondary index.

SQL
SELECT count(*) FROM taxi1 WHERE tip_amount = 5;
SHOW META;
+----------------+----------------------------------+
| Variable_name  | Value                            |
+----------------+----------------------------------+
| total          | 1                                |
| total_found    | 1                                |
| total_relation | eq                               |
| time           | 0.016                            |
| index          | tip_amount:SecondaryIndex (100%) |
+----------------+----------------------------------+
5 rows in set (0.00 sec)

SHOW META for PQ tables

SHOW META can be used after executing a CALL PQ statement, in which case it provides different output.

SHOW META following a CALL PQ statement includes:

SQL
CALL PQ ('pq', ('{"title":"angry", "gid":3 }')); SHOW META;
+------+
| id   |
+------+
|    2 |
+------+
1 row in set (0.00 sec)

+-----------------------+-----------+
| Name                  | Value     |
+-----------------------+-----------+
| Total                 | 0.000 sec |
| Queries matched       | 1         |
| Queries failed        | 0         |
| Document matched      | 1         |
| Total queries stored  | 2         |
| Term only queries     | 2         |
| Fast rejected queries | 1         |
+-----------------------+-----------+
7 rows in set (0.00 sec)

Using CALL PQ with a verbose option provides more detailed output.

It includes the following additional entries:

SQL
CALL PQ ('pq', ('{"title":"angry", "gid":3 }'), 1 as verbose); SHOW META;
+------+
| id   |
+------+
|    2 |
+------+
1 row in set (0.00 sec)

+-------------------------+-----------+
| Name                    | Value     |
+-------------------------+-----------+
| Total                   | 0.000 sec |
| Setup                   | 0.000 sec |
| Queries matched         | 1         |
| Queries failed          | 0         |
| Document matched        | 1         |
| Total queries stored    | 2         |
| Term only queries       | 2         |
| Fast rejected queries   | 1         |
| Time per query          | 69        |
| Time of matched queries | 69        |
+-------------------------+-----------+
10 rows in set (0.00 sec)
¶ 19.3

SHOW THREADS

SHOW THREADS [ OPTION columns=width[,format=sphinxql][,format=all] ]

SHOW THREADS is an SQL statement that displays information about all threads and their current activities.

The resulting table contains the following columns:

SQL
SHOW THREADS;
*************************** 1. row ***************************
                TID: 83
               Name: work_1
              Proto: mysql
              State: query
    Connection from: 172.17.0.1:43300
             ConnID: 8
 This/prev job time: 630us
       CPU activity: 94.15%
          Jobs done: 2490
      Thread status: working
               Info: SHOW THREADS 

*************************** 2. row ***************************
                TID: 84
               Name: work_2
              Proto: mysql
              State: query
    Connection from: 172.17.0.1:43301
             ConnID: 9
 This/prev job time: 689us
       CPU activity: 89.23%
          Jobs done: 1830
      Thread status: working
               Info: show threads
JSON
POST /cli -d "SHOW THREADS"
+--------+---------+-------+-------+-----------------+--------+-----------------------+-----------+---------------+--------------+
| TID    | Name    | Proto | State | Connection from | ConnID | This/prev job time, s | Jobs done | Thread status | Info         |
+--------+---------+-------+-------+-----------------+--------+-----------------------+-----------+---------------+--------------+
| 501494 | work_23 | http  | query | 127.0.0.1:41300 | 1473   | 249us                 | 1681      | working       | show_threads |
+--------+---------+-------+-------+-----------------+--------+-----------------------+-----------+---------------+--------------+
PHP
require_once __DIR__ . '/vendor/autoload.php';
$config = ['host'=>'127.0.0.1','port'=>9308];
$client = new \Manticoresearch\Client($config);
print_r($client->nodes()->threads());
Array
(
    [0] => Array
        (
            [TID] => 506960
            [Name] => work_8
            [Proto] => http
            [State] => query
            [Connection from] => 127.0.0.1:38072
            [ConnID] => 17
            [This/prev job time, s] => 231us
            [CPU activity] => 93.54%
            [Jobs done] => 8
            [Thread status] => working
            [Info] => show_threads
        )

)
Python
import manticoresearch
config = manticoresearch.Configuration(
            host = "http://127.0.0.1:9308"
            )
client = manticoresearch.ApiClient(config)
utilsApi = manticoresearch.UtilsApi(client)
print(utilsApi.sql('SHOW THREADS'))
[{'columns': [{'TID': {'type': 'long'}}, {'Name': {'type': 'string'}}, {'Proto': {'type': 'string'}}, {'State': {'type': 'string'}}, {'Connection from': {'type': 'string'}}, {'ConnID': {'type': 'long long'}}, {'This/prev job time, s': {'type': 'string'}}, {'CPU activity': {'type': 'float'}}, {'Jobs done': {'type': 'long'}}, {'Thread status': {'type': 'string'}}, {'Info': {'type': 'string'}}], 'data': [{'TID': 506958, 'Name': 'work_6', 'Proto': 'http', 'State': 'query', 'Connection from': '127.0.0.1:38600', 'ConnID': 834, 'This/prev job time, s': '206us', 'CPU activity': '91.85%', 'Jobs done': 943, 'Thread status': 'working', 'Info': 'show_threads'}], 'total': 1, 'error': '', 'warning': ''}]
javascript
var Manticoresearch = require('manticoresearch');

var utilsApi = new Manticoresearch.UtilsApi();
async function showThreads() {
    res = await utilsApi.sql('SHOW THREADS');
    console.log(JSON.stringify(res, null, 4));
}

showThreads();
[
    {
        "columns": [
            {
                "TID": {
                    "type": "long"
                }
            },
            {
                "Name": {
                    "type": "string"
                }
            },
            {
                "Proto": {
                    "type": "string"
                }
            },
            {
                "State": {
                    "type": "string"
                }
            },
            {
                "Connection from": {
                    "type": "string"
                }
            },
            {
                "ConnID": {
                    "type": "long long"
                }
            },
            {
                "This/prev job time, s": {
                    "type": "string"
                }
            },
            {
                "CPU activity": {
                    "type": "float"
                }
            },
            {
                "Jobs done": {
                    "type": "long"
                }
            },
            {
                "Thread status": {
                    "type": "string"
                }
            },
            {
                "Info": {
                    "type": "string"
                }
            }
        ],
        "data": [
            {
                "TID": 506964,
                "Name": "work_12",
                "Proto": "http",
                "State": "query",
                "Connection from": "127.0.0.1:36656",
                "ConnID": 2884,
                "This/prev job time, s": "236us",
                "CPU activity": "91.73%",
                "Jobs done": 3328,
                "Thread status": "working",
                "Info": "show_threads"
            }
        ],
        "total": 1,
        "error": "",
        "warning": ""
    }
]
Java
utilsApi.sql("SHOW THREADS");
{
  columns=[
    {
      TID={
        type=string
      }
    },
    {
      Name={
        type=string
      }
    },
    {
      Proto={
        type=string
      }
    },
    {
      State={
        type=string
      }
    },
    {
      Connection from={
        type=string
      }
    },
    {
      ConnID={
        type=string
      }
    },
    {
      This/prev job time={
        type=string
      }
    },
    {
      CPU activity={
        type=string
      }
    },
    {
      Jobs done={
        type=string
      }
    },
    {
      Thread status={
        type=string
      }
    },
    {
      Info={
        type=string
      }
    }
  ],
  data=[
    {
      TID=82,
      Name=work_0,
      Proto=http,
      State=query,
      Connection from=172.17.0.1:60550,
      ConnID=163,
      This/prev job time=105us,
      CPU activity=44.68%,
      Jobs done=849,
      Thread status=working,
      Info=show_threads
    }
  ],
  total=0,
  error=,
  warning=
}
C#
utilsApi.Sql("SHOW THREADS");
{
  columns=[
    {
      TID={
        type=string
      }
    },
    {
      Name={
        type=string
      }
    },
    {
      Proto={
        type=string
      }
    },
    {
      State={
        type=string
      }
    },
    {
      Connection from={
        type=string
      }
    },
    {
      ConnID={
        type=string
      }
    },
    {
      This/prev job time= {
        type=string
      }
    },
    {
      Jobs done={
        type=string
      }
    },
    {
      Thread status={
        type=string
      }
    },
    {
      Info={
        type=string
      }
    }
  ],
  data=[
    {
      TID=83,
      Name=work_1,
      Proto=http,
      State=query,
      Connection from=172.17.0.1:41410,
      ConnID=6,
      This/prev job time=689us,
      Jobs done=159,
      Thread status=working,
      Info=show_threads
    }
  ],
  total=0,
  error="",
  warning=""
}
TypeScript
res = await utilsApi.sql('SHOW THREADS');
[
    {
        "columns": [
            {
                "TID": {
                    "type": "long"
                }
            },
            {
                "Name": {
                    "type": "string"
                }
            },
            {
                "Proto": {
                    "type": "string"
                }
            },
            {
                "State": {
                    "type": "string"
                }
            },
            {
                "Connection from": {
                    "type": "string"
                }
            },
            {
                "ConnID": {
                    "type": "long long"
                }
            },
            {
                "This/prev job time, s": {
                    "type": "string"
                }
            },
            {
                "CPU activity": {
                    "type": "float"
                }
            },
            {
                "Jobs done": {
                    "type": "long"
                }
            },
            {
                "Thread status": {
                    "type": "string"
                }
            },
            {
                "Info": {
                    "type": "string"
                }
            }
        ],
        "data": [
            {
                "TID": 506964,
                "Name": "work_12",
                "Proto": "http",
                "State": "query",
                "Connection from": "127.0.0.1:36656",
                "ConnID": 2884,
                "This/prev job time, s": "236us",
                "CPU activity": "91.73%",
                "Jobs done": 3328,
                "Thread status": "working",
                "Info": "show_threads"
            }
        ],
        "total": 1,
        "error": "",
        "warning": ""
    }
]
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("SHOW THREADS").Execute()
[
    {
        "columns": [
            {
                "TID": {
                    "type": "long"
                }
            },
            {
                "Name": {
                    "type": "string"
                }
            },
            {
                "Proto": {
                    "type": "string"
                }
            },
            {
                "State": {
                    "type": "string"
                }
            },
            {
                "Connection from": {
                    "type": "string"
                }
            },
            {
                "ConnID": {
                    "type": "long long"
                }
            },
            {
                "This/prev job time, s": {
                    "type": "string"
                }
            },
            {
                "CPU activity": {
                    "type": "float"
                }
            },
            {
                "Jobs done": {
                    "type": "long"
                }
            },
            {
                "Thread status": {
                    "type": "string"
                }
            },
            {
                "Info": {
                    "type": "string"
                }
            }
        ],
        "data": [
            {
                "TID": 506964,
                "Name": "work_12",
                "Proto": "http",
                "State": "query",
                "Connection from": "127.0.0.1:36656",
                "ConnID": 2884,
                "This/prev job time, s": "236us",
                "CPU activity": "91.73%",
                "Jobs done": 3328,
                "Thread status": "working",
                "Info": "show_threads"
            }
        ],
        "total": 1,
        "error": "",
        "warning": ""
    }
]

The Info column displays:

You can limit the maximum width of the Info column by specifying the columns=N option.

By default, queries are displayed in their original format. However, when the format=sphinxql option is used, queries will be shown in SQL format, regardless of the protocol used for execution.

Using format=all will show all threads, while idling and system threads are hidden without this option (e.g., those busy with OPTIMIZE).

SQL
SHOW THREADS OPTION columns=30\G
JSON
POST /cli -d "SHOW THREADS OPTION columns=30"
PHP
$client->nodes()->threads(['body'=>['columns'=>30]]);
Python
utilsApi.sql('SHOW THREADS OPTION columns=30')
javascript
res = await utilsApi.sql('SHOW THREADS OPTION columns=30');
Java
utilsApi.sql("SHOW THREADS OPTION columns=30");
C#
utilsApi.Sql("SHOW THREADS OPTION columns=30");
TypeScript
res = await utilsApi.sql('SHOW THREADS OPTION columns=30');
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("SHOW THREADS OPTION columns=30").Execute()
¶ 19.4

SHOW QUERIES

SHOW QUERIES

SHOW QUERIES returns information about all currently running queries. The output is a table with the following structure:

SQL
mysql> SHOW QUERIES;
+------+--------------+---------+----------+-----------------+
| id   | query        | time    | protocol | host            |
+------+--------------+---------+----------+-----------------+
|  111 | select       | 5ms ago | http     | 127.0.0.1:58986 |
|   96 | SHOW QUERIES | 255us   | mysql    | 127.0.0.1:33616 |
+------+--------------+---------+----------+-----------------+
2 rows in set (0.61 sec)

Refer to SHOW THREADS if you'd like to gain insight from the perspective of the threads themselves.

¶ 19.5

SHOW VERSION

SHOW VERSION

SHOW VERSION provides detailed version information of various components of the Manticore Search instance. This command is particularly useful for administrators and developers who need to verify the version of Manticore Search they are running, along with the versions of its associated components.

The output table includes two columns:
- Component: This column names the specific component of Manticore Search.
- Version: This column displays the version information for the respective component.

SQL
mysql> SHOW VERSION;
+-----------+--------------------------------+
| Component | Version                        |
+-----------+--------------------------------+
| Daemon    | 6.2.13 61cfe38d2@24011520 dev  |
| Columnar  | columnar 2.2.5 214ce90@240115  |
| Secondary | secondary 2.2.5 214ce90@240115 |
| KNN       | knn 2.2.5 214ce90@240115       |
| Buddy     | buddy v2.0.11                  |
+-----------+--------------------------------+
¶ 19.6

KILL

KILL <query id>

KILL terminates the execution of a query by its ID, which you can find in SHOW QUERIES.

SQL
mysql> KILL 4;
Query OK, 1 row affected (0.00 sec)
¶ 19.7

SHOW WARNINGS

SHOW WARNINGS statement can be used to retrieve the warning produced by the latest query. The error message will be returned along with the query itself:

mysql> SELECT * FROM test1 WHERE MATCH('@@title hello') \G
ERROR 1064 (42000): index test1: syntax error, unexpected TOK_FIELDLIMIT
near '@title hello'

mysql> SELECT * FROM test1 WHERE MATCH('@title -hello') \G
ERROR 1064 (42000): index test1: query is non-computable (single NOT operator)

mysql> SELECT * FROM test1 WHERE MATCH('"test doc"/3') \G

*************************** 1\. row ***************************
        id: 4
    weight: 2500
  group_id: 2
date_added: 1231721236
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS \G

*************************** 1\. row ***************************
  Level: warning
   Code: 1000
Message: quorum threshold too high (words=2, thresh=3); replacing quorum operator
         with AND operator
1 row in set (0.00 sec)
¶ 19.8

SHOW VARIABLES

SHOW [{GLOBAL | SESSION}] VARIABLES LIKE 'pattern'

It returns the current values of a few server-wide variables. Also, support for GLOBAL and SESSION clauses was added.

mysql> SHOW GLOBAL VARIABLES;
+--------------------------+-----------+
| Variable_name            | Value     |
+--------------------------+-----------+
| autocommit               | 1         |
| collation_connection     | libc_ci   |
| query_log_format         | sphinxql  |
| log_level                | info      |
| max_allowed_packet       | 134217728 |
| character_set_client     | utf8      |
| character_set_connection | utf8      |
| grouping_in_utc          | 0         |
| last_insert_id           | 123, 200  |
+--------------------------+-----------+
9 rows in set (0.00 sec)
mysql> show variables like '%log%';
+------------------+----------+
| Variable_name    | Value    |
+------------------+----------+
| query_log_format | sphinxql |
| log_level        | info     |
+------------------+----------+
2 rows in set (0.00 sec)
mysql> show session variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | 0     |
+---------------+-------+
1 row in set (0.00 sec)
¶ 19.9

Profiling

¶ 19.9.1

Query profile

The SQL SHOW PROFILE statement and the "profile": true JSON interface option both provide a detailed execution profile of the executed query. In the case of SQL, profiling must be enabled in the current session before running the statement to be instrumented. This can be accomplished with the SET profiling=1 statement. By default, profiling is disabled to prevent potential performance implications, resulting in an empty profile if not enabled.

Each profiling result includes the following fields:

States in the profile are returned in a prerecorded order that roughly maps (but is not identical) to the actual query order.

The list of states may (and will) change over time as we refine the states. Here's a brief description of the currently profiled states.

SQL
SET profiling=1;

SELECT id FROM forum WHERE MATCH('the best') LIMIT 1;

SHOW PROFILE;
Query OK, 0 rows affected (0.00 sec)

+--------+
| id     |
+--------+
| 241629 |
+--------+
1 row in set (0.35 sec)

+--------------+----------+----------+---------+
| Status       | Duration | Switches | Percent |
+--------------+----------+----------+---------+
| unknown      | 0.000557 | 5        | 0.16    |
| net_read     | 0.000016 | 1        | 0.00    |
| local_search | 0.000076 | 1        | 0.02    |
| sql_parse    | 0.000121 | 1        | 0.03    |
| dict_setup   | 0.000003 | 1        | 0.00    |
| parse        | 0.000072 | 1        | 0.02    |
| transforms   | 0.000331 | 2        | 0.10    |
| init         | 0.001945 | 3        | 0.56    |
| read_docs    | 0.001257 | 76       | 0.36    |
| read_hits    | 0.002598 | 186      | 0.75    |
| get_docs     | 0.089328 | 2691     | 25.80   |
| get_hits     | 0.189626 | 2799     | 54.78   |
| filter       | 0.009369 | 2613     | 2.71    |
| rank         | 0.029669 | 7760     | 8.57    |
| sort         | 0.019070 | 2531     | 5.51    |
| finalize     | 0.000001 | 1        | 0.00    |
| clone_attrs  | 0.002009 | 1        | 0.58    |
| aggregate    | 0.000054 | 2        | 0.02    |
| net_write    | 0.000076 | 2        | 0.02    |
| eval_post    | 0.000001 | 1        | 0.00    |
| total        | 0.346179 | 18678    | 0       |
+--------------+----------+----------+---------+
21 rows in set (0.00 sec)
JSON
POST /search
{
  "index": "test",
  "profile": true,
  "query":
  {
    "match_phrase": { "_all" : "had grown quite" }
  }
}
 "profile": {
    "query": [
      {
        "status": "unknown",
        "duration": 0.000141,
        "switches": 8,
        "percent": 2.17
      },
      {
        "status": "local_df",
        "duration": 0.000870,
        "switches": 1,
        "percent": 13.40
      },
      {
        "status": "local_search",
        "duration": 0.001038,
        "switches": 2,
        "percent": 15.99
      },
      {
        "status": "setup_iter",
        "duration": 0.000154,
        "switches": 14,
        "percent": 2.37
      },
      {
        "status": "dict_setup",
        "duration": 0.000026,
        "switches": 3,
        "percent": 0.40
      },
      {
        "status": "parse",
        "duration": 0.000205,
        "switches": 3,
        "percent": 3.15
      },
      {
        "status": "transforms",
        "duration": 0.000974,
        "switches": 4,
        "percent": 15.01
      },
      {
        "status": "init",
        "duration": 0.002931,
        "switches": 20,
        "percent": 45.16
      },
      {
        "status": "get_docs",
        "duration": 0.000007,
        "switches": 7,
        "percent": 0.10
      },
      {
        "status": "rank",
        "duration": 0.000002,
        "switches": 14,
        "percent": 0.03
      },
      {
        "status": "finalize",
        "duration": 0.000013,
        "switches": 7,
        "percent": 0.20
      },
      {
        "status": "aggregate",
        "duration": 0.000128,
        "switches": 1,
        "percent": 1.97
      },
      {
        "status": "total",
        "duration": 0.006489,
        "switches": 84,
        "percent": 100.00
      }
    ]
  }
¶ 19.9.2

Query plan

The SHOW PLAN SQL statement and the "plan": N JSON interface option display the query execution plan. The plan is generated and stored during the actual execution, so in the case of SQL, profiling must be enabled in the current session before running that statement. This can be done with a SET profiling=1 statement.

Two items are returned in SQL mode:

To view the query execution plan in a JSON query, add "plan": N to the query. The result will appear as a plan property in the result set. N can be one of the following:

SQL
set profiling=1;

select * from hn_small where match('dog|cat') limit 0;

show plan;
*************************** 1. row ***************************
Variable: transformed_tree
   Value: OR(
  AND(KEYWORD(dog, querypos=1)),
  AND(KEYWORD(cat, querypos=2)))

*************************** 2. row ***************************
Variable: enabled_indexes
   Value:
2 rows in set (0.00 sec)
JSON
POST /search
{
  "index": "hn_small",
  "query": {"query_string": "dog|cat"},
  "_source": { "excludes":["*"] },
  "limit": 0,
  "plan": 3
}
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 4453,
    "total_relation": "eq",
    "hits": []
  },
  "plan": {
    "query": {
      "type": "OR",
      "description": "OR( AND(KEYWORD(dog, querypos=1)),  AND(KEYWORD(cat, querypos=2)))",
      "children": [
        {
          "type": "AND",
          "description": "AND(KEYWORD(dog, querypos=1))",
          "children": [
            {
              "type": "KEYWORD",
              "word": "dog",
              "querypos": 1
            }
          ]
        },
        {
          "type": "AND",
          "description": "AND(KEYWORD(cat, querypos=2))",
          "children": [
            {
              "type": "KEYWORD",
              "word": "cat",
              "querypos": 2
            }
          ]
        }
      ]
    }
  }
}

In some cases, the evaluated query tree can be quite different from the original one due to expansions and other transformations.

SQL
SET profiling=1;

SELECT id FROM forum WHERE MATCH('@title way* @content hey') LIMIT 1;

SHOW PLAN;
Query OK, 0 rows affected (0.00 sec)

+--------+
| id     |
+--------+
| 711651 |
+--------+
1 row in set (0.04 sec)

+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable         | Value                                                                                                                                                                                                                                                                                                                                                                                                                   |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| transformed_tree | AND(
  OR(
    OR(
      AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),
      OR(
        AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),
        AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),
    AND(fields=(title), KEYWORD(way, querypos=1, expanded)),
    OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),
  AND(fields=(content), KEYWORD(hey, querypos=2))) |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
JSON
POST /search
{
  "index": "forum",
  "query": {"query_string": "@title way* @content hey"},
  "_source": { "excludes":["*"] },
  "limit": 1,
  "plan": 3
}
{
  "took":33,
  "timed_out":false,
  "hits":
  {
    "total":105,
    "hits":
    [
       {
          "_id":"711651",
          "_score":2539,
          "_source":{}
       }
    ]
  },
  "plan":
  {
    "query":
    {
      "type":"AND",
      "description":"AND( OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),  AND(fields=(title), KEYWORD(way, querypos=1, expanded)),  OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),  AND(fields=(content), KEYWORD(hey, querypos=2)))",
      "children":
      [
        {
          "type":"OR",
          "description":"OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),  AND(fields=(title), KEYWORD(way, querypos=1, expanded)),  OR(fields=(title), KEYWORD(way*, querypos=1, expanded)))",
          "children":
          [
            {
               "type":"OR",
               "description":"OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))))",
               "children":
               [
                 {
                   "type":"AND",
                   "description":"AND(fields=(title), KEYWORD(wayne, querypos=1, expanded))",
                   "fields":["title"],
                   "max_field_pos":0,
                   "children":
                   [
                     {
                       "type":"KEYWORD",
                       "word":"wayne",
                       "querypos":1,
                       "expanded":true
                     }
                   ]
                 },
                 {
                   "type":"OR",
                   "description":"OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))",
                   "children":
                   [
                     {
                       "type":"AND",
                       "description":"AND(fields=(title), KEYWORD(ways, querypos=1, expanded))",
                       "fields":["title"],
                       "max_field_pos":0,
                       "children":
                       [
                         {
                           "type":"KEYWORD",
                           "word":"ways",
                           "querypos":1,
                           "expanded":true
                         }
                       ]
                     },
                     {
                       "type":"AND",
                       "description":"AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded))",
                       "fields":["title"],
                       "max_field_pos":0,
                       "children":
                       [
                         {
                           "type":"KEYWORD",
                           "word":"wayyy",
                           "querypos":1,
                           "expanded":true
                         }
                       ]
                     }
                   ]
                 }
               ]
            },
            {
              "type":"AND",
              "description":"AND(fields=(title), KEYWORD(way, querypos=1, expanded))",
              "fields":["title"],
              "max_field_pos":0,
              "children":
              [
                 {
                    "type":"KEYWORD",
                    "word":"way",
                    "querypos":1,
                    "expanded":true
                 }
              ]
            },
            {
              "type":"OR",
              "description":"OR(fields=(title), KEYWORD(way*, querypos=1, expanded))",
              "fields":["title"],
              "max_field_pos":0,
              "children":
              [
                {
                  "type":"KEYWORD",
                  "word":"way*",
                  "querypos":1,
                  "expanded":true
                }
              ]
            }
          ]
        },
        {
          "type":"AND",
          "description":"AND(fields=(content), KEYWORD(hey, querypos=2))",
          "fields":["content"],
          "max_field_pos":0,
          "children":
          [
            {
              "type":"KEYWORD",
              "word":"hey",
              "querypos":2
            }
          ]
        }
      ]
    }
  }
}
JSON
POST /search
{
  "index": "forum",
  "query": {"query_string": "@title way* @content hey"},
  "_source": { "excludes":["*"] },
  "limit": 1,
  "plan": 2
}
{
  "took": 33,
  "timed_out": false,
  "hits": {
    "total": 105,
    "hits": [
      {
        "_id": "711651",
        "_score": 2539,
        "_source": {}
      }
    ]
  },
  "plan": {
    "query": {
      "type": "AND",
      "children": [
        {
          "type": "OR",
          "children": [
            {
              "type": "OR",
              "children": [
                {
                  "type": "AND",
                  "fields": [
                    "title"
                  ],
                  "max_field_pos": 0,
                  "children": [
                    {
                      "type": "KEYWORD",
                      "word": "wayne",
                      "querypos": 1,
                      "expanded": true
                    }
                  ]
                },
                {
                  "type": "OR",
                  "children": [
                    {
                      "type": "AND",
                      "fields": [
                        "title"
                      ],
                      "max_field_pos": 0,
                      "children": [
                        {
                          "type": "KEYWORD",
                          "word": "ways",
                          "querypos": 1,
                          "expanded": true
                        }
                      ]
                    },
                    {
                      "type": "AND",
                      "fields": [
                        "title"
                      ],
                      "max_field_pos": 0,
                      "children": [
                        {
                          "type": "KEYWORD",
                          "word": "wayyy",
                          "querypos": 1,
                          "expanded": true
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "type": "AND",
              "fields": [
                "title"
              ],
              "max_field_pos": 0,
              "children": [
                {
                  "type": "KEYWORD",
                  "word": "way",
                  "querypos": 1,
                  "expanded": true
                }
              ]
            },
            {
              "type": "OR",
              "fields": [
                "title"
              ],
              "max_field_pos": 0,
              "children": [
                {
                  "type": "KEYWORD",
                  "word": "way*",
                  "querypos": 1,
                  "expanded": true
                }
              ]
            }
          ]
        },
        {
          "type": "AND",
          "fields": [
            "content"
          ],
          "max_field_pos": 0,
          "children": [
            {
              "type": "KEYWORD",
              "word": "hey",
              "querypos": 2
            }
          ]
        }
      ]
    }
  }
}
JSON
POST /search
{
  "index": "forum",
  "query": {"query_string": "@title way* @content hey"},
  "_source": { "excludes":["*"] },
  "limit": 1,
  "plan": 1
}
{
  "took":33,
  "timed_out":false,
  "hits":
  {
    "total":105,
    "hits":
    [
       {
          "_id":"711651",
          "_score":2539,
          "_source":{}
       }
    ]
  },
  "plan":
  {
    "query":
    {
      "description":"AND( OR( OR( AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),  OR( AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),  AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),  AND(fields=(title), KEYWORD(way, querypos=1, expanded)),  OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),  AND(fields=(content), KEYWORD(hey, querypos=2)))"
    }
  }
}

See also EXPLAIN QUERY. It displays the execution tree of a full-text query without actually executing the query. Note that when using SHOW PLAN after a query to a real-time table, the result will be based on a random disk/RAM chunk. Therefore, if you have recently modified the table's tokenization settings, or if the chunks vary significantly in terms of dictionaries, etc., you might not get the result you are expecting. Take this into account and consider using EXPLAIN QUERY as well.

JSON result set notes

query property contains the transformed full-text query tree. Each node contains:

Dot format for SHOW PLAN

SHOW PLAN format=dot allows returning the full-text query execution tree in a hierarchical format suitable for visualization by existing tools, such as https://dreampuf.github.io/GraphvizOnline:

MySQL [(none)]> show plan option format=dot\G

*************************** 1. row ***************************
Variable: transformed_tree
   Value: digraph "transformed_tree"
{

0 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
0 -> 1
1 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
1 -> 2
2 [shape=record label="i | { querypos=1 }"]
0 -> 3
3 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
3 -> 4
4 [shape=record label="me | { querypos=2 }"]
}

SHOW PLAN graphviz example

¶ 19.10

Table settings and status

¶ 19.10.1

SHOW TABLE STATUS

SHOW TABLE STATUS is an SQL statement that displays various per-table statistics.

The syntax is:

SHOW TABLE index_name STATUS

Depending on index type, displayed statistic includes different set of rows:

Here is the meaning of these values:

SQL
mysql> SHOW TABLE statistic STATUS;
+-----------------------------+--------------------------------------------------------------------------+
| Variable_name               | Value                                                                    |
+-----------------------------+--------------------------------------------------------------------------+
| index_type                  | rt                                                                       |
| indexed_documents           | 146000                                                                   |
| indexed_bytes               | 149504000                                                                |
| ram_bytes                   | 87674788                                                                 |
| disk_bytes                  | 1762811                                                                  |
| disk_mapped                 | 794147                                                                   |
| disk_mapped_cached          | 802816                                                                   |
| disk_mapped_doclists        | 0                                                                        |
| disk_mapped_cached_doclists | 0                                                                        |
| disk_mapped_hitlists        | 0                                                                        |
| disk_mapped_cached_hitlists | 0                                                                        |
| killed_documents            | 0                                                                        |
| killed_rate                 | 0.00%                                                                    |
| ram_chunk                   | 86865484                                                                 |
| ram_chunk_segments_count    | 24                                                                       |
| disk_chunks                 | 1                                                                        |
| mem_limit                   | 134217728                                                                |
| mem_limit_rate              | 95.00%                                                                   |
| ram_bytes_retired           | 0                                                                        |
| locked                      | 0                                                                        |
| tid                         | 0                                                                        |
| tid_saved                   | 0                                                                        |
| query_time_1min             | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
| query_time_5min             | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
| query_time_15min            | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
| query_time_total            | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
| found_rows_1min             | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
| found_rows_5min             | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
| found_rows_15min            | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
| found_rows_total            | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"} |
+-----------------------------+--------------------------------------------------------------------------+
29 rows in set (0.00 sec)
PHP
$index->status();
Array(
    [index_type] => rt
    [indexed_documents] => 3
    [indexed_bytes] => 0
    [ram_bytes] => 6678
    [disk_bytes] => 611
    [ram_chunk] => 990
    [ram_chunk_segments_count] => 2
    [mem_limit] => 134217728
    [ram_bytes_retired] => 0
    [locked] => 0
    [tid] => 15
    [query_time_1min] => {"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}
    [query_time_5min] => {"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}
    [query_time_15min] => {"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}
    [query_time_total] => {"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}
    [found_rows_1min] => {"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}
    [found_rows_5min] => {"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}
    [found_rows_15min] => {"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}
    [found_rows_total] => {"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}

)
Python
utilsApi.sql('SHOW TABLE statistic STATUS')
{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
   {u'Key': u'index_type', u'Value': u'rt'}
    {u'Key': u'indexed_documents', u'Value': u'3'}
    {u'Key': u'indexed_bytes', u'Value': u'0'}
    {u'Key': u'ram_bytes', u'Value': u'6678'}
    {u'Key': u'disk_bytes', u'Value': u'611'}
    {u'Key': u'ram_chunk', u'Value': u'990'}
    {u'Key': u'ram_chunk_segments_count', u'Value': u'2'}
    {u'Key': u'mem_limit', u'Value': u'134217728'}
    {u'Key': u'ram_bytes_retired', u'Value': u'0'}
    {u'Key': u'locked', u'Value': u'0'}
    {u'Key': u'tid', u'Value': u'15'}
    {u'Key': u'query_time_1min', u'Value': u'{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}'}
    {u'Key': u'query_time_5min', u'Value': u'{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}'}
    {u'Key': u'query_time_15min', u'Value': u'{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}'}
    {u'Key': u'query_time_total', u'Value': u'{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}'}
    {u'Key': u'found_rows_1min', u'Value': u'{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}'}
    {u'Key': u'found_rows_5min', u'Value': u'{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}'}
    {u'Key': u'found_rows_15min', u'Value': u'{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}'}
    {u'Key': u'found_rows_total', u'Value': u'{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}
Javascript
res = await utilsApi.sql('SHOW TABLE statistic STATUS');
{"columns": [{"Key": {"type": "string"}},
              {"Value": {"type": "string"}}],
 "data": [
   {"Key": "index_type", "Value": "rt"}
    {"Key": "indexed_documents", "Value": "3"}
    {"Key": "indexed_bytes", "Value": "0"}
    {"Key": "ram_bytes", "Value": "6678"}
    {"Key": "disk_bytes", "Value": "611"}
    {"Key": "ram_chunk", "Value": "990"}
    {"Key": "ram_chunk_segments_count", "Value": "2"}
    {"Key": "mem_limit", "Value": "134217728"}
    {"Key": "ram_bytes_retired", "Value": "0"}
    {"Key": "locked", "Value": "0"}
    {"Key": "tid", "Value": "15"}
    {"Key": "query_time_1min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
    {"Key": "query_time_5min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
    {"Key": "query_time_15min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
    {"Key": "query_time_total", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
    {"Key": "found_rows_1min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
    {"Key": "found_rows_5min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
    {"Key": "found_rows_15min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
    {"Key": "found_rows_total", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}],
 "error": "",
 "total": 0,
 "warning": ""}
Java
utilsApi.sql("SHOW TABLE statistic STATUS");
{columns=[{ Key : { type=string }},
              { Value : { type=string }}],
  data : [
   { Key=index_type, Value=rt}
    { Key=indexed_documents, Value=3}
    { Key=indexed_bytes, Value=0}
    { Key=ram_bytes, Value=6678}
    { Key=disk_bytes, Value=611}
    { Key=ram_chunk, Value=990}
    { Key=ram_chunk_segments_count, Value=2}
    { Key=mem_limit, Value=134217728}
    { Key=ram_bytes_retired, Value=0}
    { Key=locked, Value=0}
    { Key=tid, Value=15}
    { Key=query_time_1min, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=query_time_5min, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=query_time_15min, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=query_time_total, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=found_rows_1min, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}
    { Key=found_rows_5min, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}
    { Key=found_rows_15min, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}
    { Key=found_rows_total, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}],
  error= ,
  total=0,
  warning= }
C#
utilsApi.Sql("SHOW TABLE statistic STATUS");
{columns=[{ Key : { type=string }},
              { Value : { type=string }}],
  data : [
   { Key=index_type, Value=rt}
    { Key=indexed_documents, Value=3}
    { Key=indexed_bytes, Value=0}
    { Key=ram_bytes, Value=6678}
    { Key=disk_bytes, Value=611}
    { Key=ram_chunk, Value=990}
    { Key=ram_chunk_segments_count, Value=2}
    { Key=mem_limit, Value=134217728}
    { Key=ram_bytes_retired, Value=0}
    { Key=locked, Value=0}
    { Key=tid, Value=15}
    { Key=query_time_1min, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=query_time_5min, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=query_time_15min, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=query_time_total, Value={queries:1, avg_sec:0.001, min_sec:0.001, max_sec:0.001, pct95_sec:0.001, pct99_sec:0.001}}
    { Key=found_rows_1min, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}
    { Key=found_rows_5min, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}
    { Key=found_rows_15min, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}
    { Key=found_rows_total, Value={queries:1, avg:3, min:3, max:3, pct95:3, pct99:3}}],
  error="" ,
  total=0,
  warning="" }
TypeScript
res = await utilsApi.sql('SHOW TABLE statistic STATUS');
{
    "columns": 
    [{
        "Key": {"type": "string"}
    },
    {
        "Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "index_type", "Value": "rt"}
        {"Key": "indexed_documents", "Value": "3"}
        {"Key": "indexed_bytes", "Value": "0"}
        {"Key": "ram_bytes", "Value": "6678"}
        {"Key": "disk_bytes", "Value": "611"}
        {"Key": "ram_chunk", "Value": "990"}
        {"Key": "ram_chunk_segments_count", "Value": "2"}
        {"Key": "mem_limit", "Value": "134217728"}
        {"Key": "ram_bytes_retired", "Value": "0"}
        {"Key": "locked", "Value": "0"}
        {"Key": "tid", "Value": "15"}
        {"Key": "query_time_1min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "query_time_5min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "query_time_15min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "query_time_total", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "found_rows_1min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
        {"Key": "found_rows_5min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
        {"Key": "found_rows_15min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
        {"Key": "found_rows_total", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
    ],
    "error": "",
    "total": 0,
    "warning": ""
}
Go
apiClient.UtilsAPI.Sql(context.Background()).Body("SHOW TABLE statistic STATUS").Execute()
{
    "columns": 
    [{
        "Key": {"type": "string"}
    },
    {
        "Value": {"type": "string"}
    }],
    "data": 
    [
        {"Key": "index_type", "Value": "rt"}
        {"Key": "indexed_documents", "Value": "3"}
        {"Key": "indexed_bytes", "Value": "0"}
        {"Key": "ram_bytes", "Value": "6678"}
        {"Key": "disk_bytes", "Value": "611"}
        {"Key": "ram_chunk", "Value": "990"}
        {"Key": "ram_chunk_segments_count", "Value": "2"}
        {"Key": "mem_limit", "Value": "134217728"}
        {"Key": "ram_bytes_retired", "Value": "0"}
        {"Key": "locked", "Value": "0"}
        {"Key": "tid", "Value": "15"}
        {"Key": "query_time_1min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "query_time_5min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "query_time_15min", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "query_time_total", "Value": "{"queries":1, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}"}
        {"Key": "found_rows_1min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
        {"Key": "found_rows_5min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
        {"Key": "found_rows_15min", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
        {"Key": "found_rows_total", "Value": "{"queries":1, "avg":3, "min":3, "max":3, "pct95":3, "pct99":3}"}
    ],
    "error": "",
    "total": 0,
    "warning": ""
}
¶ 19.10.2

SHOW TABLE SETTINGS

SHOW TABLE SETTINGS is an SQL statement that displays per-table settings in a format compatible with the config file.

The syntax is:

SHOW TABLE index_name[.N | CHUNK N] SETTINGS

The output resembles the --dumpconfig option of the indextool utility. The report provides a breakdown of all table settings, including tokenizer and dictionary options.

SQL
SHOW TABLE forum SETTINGS;
+---------------+-----------------------------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                                     |
+---------------+-----------------------------------------------------------------------------------------------------------+
| settings      | min_prefix_len = 3
charset_table = 0..9, A..Z->a..z, _, -, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F |
+---------------+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

You can also specify a particular chunk number to view the settings of a specific chunk in an RT table. The numbering is 0-based.

SQL
SHOW TABLE forum CHUNK 0 SETTINGS;
+---------------+-----------------------------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                                     |
+---------------+-----------------------------------------------------------------------------------------------------------+
| settings      | min_prefix_len = 3
charset_table = 0..9, A..Z->a..z, _, -, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F |
+---------------+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
¶ 20

Server settings

¶ 20.1

Section "Searchd" in configuration

The below settings are to be used in the searchd section of the Manticore Search configuration file to control the server's behavior. Below is a summary of each setting:

access_plain_attrs

This setting sets instance-wide defaults for access_plain_attrs. It is optional, with a default value of mmap_preread.

The access_plain_attrs directive allows you to define the default value of access_plain_attrs for all tables managed by this searchd instance. Per-table directives have higher priority and will override this instance-wide default, providing more fine-grained control.

access_blob_attrs

This setting sets instance-wide defaults for access_blob_attrs. It is optional, with a default value of mmap_preread.

The access_blob_attrs directive allows you to define the default value of access_blob_attrs for all tables managed by this searchd instance. Per-table directives have higher priority and will override this instance-wide default, providing more fine-grained control.

access_doclists

This setting sets instance-wide defaults for access_doclists. It is optional, with a default value of file.

The access_doclists directive allows you to define the default value of access_doclists for all tables managed by this searchd instance. Per-table directives have higher priority and will override this instance-wide default, providing more fine-grained control.

access_hitlists

This setting sets instance-wide defaults for access_hitlists. It is optional, with a default value of file.

The access_hitlists directive allows you to define the default value of access_hitlists for all tables managed by this searchd instance. Per-table directives have higher priority and will override this instance-wide default, providing more fine-grained control.

access_dict

This setting sets instance-wide defaults for access_dict. It is optional, with a default value of mmap_preread.

The access_dict directive allows you to define the default value of access_dict for all tables managed by this searchd instance. Per-table directives have higher priority and will override this instance-wide default, providing more fine-grained control.

agent_connect_timeout

This setting sets instance-wide defaults for the agent_connect_timeout parameter.

agent_query_timeout

This setting sets instance-wide defaults for the agent_query_timeout parameter. It can be overridden on a per-query basis using the OPTION agent_query_timeout=XXX clause.

agent_retry_count

This setting is an integer that specifies how many times Manticore will attempt to connect and query remote agents through a distributed table before reporting a fatal query error. The default value is 0 (i.e., no retries). You can also set this value on a per-query basis using the OPTION retry_count=XXX clause. If a per-query option is provided, it will override the value specified in the configuration.

Note that if you use agent mirrors in the definition of your distributed table, the server will select a different mirror for each connection attempt according to the chosen ha_strategy. In this case, the agent_retry_count will be aggregated for all mirrors in a set.

For example, if you have 10 mirrors and set agent_retry_count=5, the server will retry up to 50 times, assuming an average of 5 tries for each of the 10 mirrors (with the ha_strategy = roundrobin option, this will be the case).

However, the value provided as the retry_count option for the agent serves as an absolute limit. In other words, the [retry_count=2] option in the agent definition always means a maximum of 2 attempts, regardless of whether you have specified 1 or 10 mirrors for the agent.

agent_retry_delay

This setting is an integer in milliseconds (or special_suffixes) that specifies the delay before Manticore retries querying a remote agent in case of failure. This value is only relevant when a non-zero agent_retry_count or non-zero per-query retry_count is specified. The default value is 500. You can also set this value on a per-query basis using the OPTION retry_delay=XXX clause. If a per-query option is provided, it will override the value specified in the configuration.

attr_flush_period

When using Update to modify document attributes in real-time, the changes are first written to an in-memory copy of the attributes. These updates occur in a memory-mapped file, meaning the OS decides when to write the changes to disk. Upon normal shutdown of searchd (triggered by a SIGTERM signal), all changes are forced to be written to disk.

You can also instruct searchd to periodically write these changes back to disk to prevent data loss. The interval between these flushes is determined by attr_flush_period, specified in seconds (or special_suffixes).

By default, the value is 0, which disables periodic flushing. However, flushing will still occur during a normal shutdown.

Example
attr_flush_period = 900 # persist updates to disk every 15 minutes

auto_optimize

This setting controls the automatic OPTIMIZE process for table compaction.

By default table compaction occurs automatically. You can modify this behavior with the auto_optimize setting:

Note that toggling auto_optimize on or off doesn't prevent you from running OPTIMIZE TABLE manually.

Disable
auto_optimize = 0 # disable automatic OPTIMIZE
Throttle
auto_optimize = 2 # OPTIMIZE starts at 16 chunks (on 4 cpu cores server)

auto_schema

Manticore supports the automatic creation of tables that don't yet exist but are specified in INSERT statements. This feature is enabled by default. To disable it, set auto_schema = 0 explicitly in your configuration. To re-enable it, set auto_schema = 1 or remove the auto_schema setting from the configuration.

Keep in mind that the /bulk HTTP endpoint does not support automatic table creation.

Disable
auto_schema = 0 # disable automatic table creation
Enable
auto_schema = 1 # enable automatic table creation

binlog_flush

This setting controls the binary log transaction flush/sync mode. It is optional, with a default value of 2 (flush every transaction, sync every second).

The directive determines how frequently the binary log will be flushed to the OS and synced to disk. There are three supported modes:

For those familiar with MySQL and InnoDB, this directive is similar to innodb_flush_log_at_trx_commit. In most cases, the default hybrid mode 2 provides a nice balance of speed and safety, with full RT table data protection against server crashes and some protection against hardware ones.

Example
binlog_flush = 1 # ultimate safety, low speed

binlog_max_log_size

This setting controls the maximum binary log file size. It is optional, with a default value of 268435456, or 256 MB.

A new binlog file will be forcibly opened once the current binlog file reaches this size limit. This results in a finer granularity of logs and can lead to more efficient binlog disk usage under certain borderline workloads. A value of 0 indicates that the binlog file should not be reopened based on size.

Example
binlog_max_log_size = 16M

binlog_path

This setting determines the path for binary log (also known as transaction log) files. It is optional, with a default value of the build-time configured data directory (e.g., /var/lib/manticore/data/binlog.* in Linux).

Binary logs are used for crash recovery of RT table data and for attribute updates of plain disk indices that would otherwise only be stored in RAM until flush. When logging is enabled, every transaction COMMIT-ted into an RT table is written into a log file. Logs are then automatically replayed on startup after an unclean shutdown, recovering the logged changes.

The binlog_path directive specifies the location of binary log files. It should only contain the path; searchd will create and unlink multiple binlog.* files in the directory as necessary (including binlog data, metadata, and lock files, etc).

An empty value disables binary logging, which improves performance but puts the RT table data at risk.

Example
binlog_path = # disable logging
binlog_path = /var/lib/manticore/data # /var/lib/manticore/data/binlog.001 etc will be created

buddy_path

This setting determines the path to the Manticore Buddy binary. It is optional, with a default value being the build-time configured path, which varies across different operating systems. Typically, you don't need to modify this setting. However, it may be useful if you wish to run Manticore Buddy in debug mode, make changes to Manticore Buddy, or implement a new plugin. In the latter case, you can git clone Buddy from https://github.com/manticoresoftware/manticoresearch-buddy, add a new plugin to the directory ./plugins/, and run composer install --prefer-source for easier development after you change the directory to the Buddy source.

To ensure you can run composer, your machine must have PHP 8.2 or higher installed with the following extensions:

--enable-dom
--with-libxml
--enable-tokenizer
--enable-xml
--enable-xmlwriter
--enable-xmlreader
--enable-simplexml
--enable-phar
--enable-bcmath
--with-gmp
--enable-debug
--with-mysqli
--enable-mysqlnd

You can also opt for the special manticore-executor-dev version for Linux amd64 available in the releases, for example: https://github.com/manticoresoftware/executor/releases/tag/v1.0.13

If you go this route, remember to link the dev version of the manticore executor to /usr/bin/php.

Example
buddy_path = manticore-executor -n /usr/share/manticore/modules/manticore-buddy/src/main.php --debug # use the default Manticore Buddy in Linux, but run it in debug mode
buddy_path = manticore-executor -n /opt/homebrew/share/manticore/modules/manticore-buddy/bin/manticore-buddy/src/main.php --debug # use the default Manticore Buddy in MacOS arm64, but run it in debug mode
buddy_path = manticore-executor -n /Users/username/manticoresearch-buddy/src/main.php --debug # use Manticore Buddy from a non-default location

client_timeout

This setting determines the maximum time to wait between requests (in seconds or special_suffixes) when using persistent connections. It is optional, with a default value of five minutes.

Example
client_timeout = 1h

collation_libc_locale

Server libc locale. Optional, default is C.

Specifies the libc locale, affecting the libc-based collations. Refer to collations section for the details.

Example
collation_libc_locale = fr_FR

collation_server

Default server collation. Optional, default is libc_ci.

Specifies the default collation used for incoming requests. The collation can be overridden on a per-query basis. Refer to collations section for the list of available collations and other details.

Example
collation_server = utf8_ci

data_dir

When specified, this setting enables the real-time mode, which is an imperative way of managing data schema. The value should be a path to the directory where you want to store all your tables, binary logs, and everything else needed for the proper functioning of Manticore Search in this mode.
Indexing of plain tables is not allowed when the data_dir is specified. Read more about the difference between the RT mode and the plain mode in this section.

Example
data_dir = /var/lib/manticore

docstore_cache_size

This setting specifies the maximum size of document blocks from document storage that are held in memory. It is optional, with a default value of 16m (16 megabytes).

When stored_fields is used, document blocks are read from disk and uncompressed. Since every block typically holds several documents, it may be reused when processing the next document. For this purpose, the block is held in a server-wide cache. The cache holds uncompressed blocks.

Example
docstore_cache_size = 8m

engine

Default attribute storage engine used when creating tables in RT mode. Can be rowwise (default) or columnar.

Example
engine = columnar

expansion_limit

This setting determines the maximum number of expanded keywords for a single wildcard. It is optional, with a default value of 0 (no limit).

When performing substring searches against tables built with dict = keywords enabled, a single wildcard may potentially result in thousands or even millions of matched keywords (think of matching a* against the entire Oxford dictionary). This directive allows you to limit the impact of such expansions. Setting expansion_limit = N restricts expansions to no more than N of the most frequent matching keywords (per each wildcard in the query).

Example
expansion_limit = 16

expansion_merge_threshold_docs

This setting determines the maximum number of documents in the expanded keyword that allows merging all such keywords together. It is optional, with a default value of 32.

When performing substring searches against tables built with dict = keywords enabled, a single wildcard may potentially result in thousands or even millions of matched keywords. This directive allows you to increase the limit of how many keywords will merge together to speed up matching but uses more memory in the search.

Example
expansion_merge_threshold_docs = 1024

expansion_merge_threshold_hits

This setting determines the maximum number of hits in the expanded keyword that allows merging all such keywords together. It is optional, with a default value of 256.

When performing substring searches against tables built with dict = keywords enabled, a single wildcard may potentially result in thousands or even millions of matched keywords. This directive allows you to increase the limit of how many keywords will merge together to speed up matching but uses more memory in the search.

Example
expansion_merge_threshold_hits = 512

grouping_in_utc

This setting specifies whether timed grouping in API and SQL will be calculated in the local timezone or in UTC. It is optional, with a default value of 0 (meaning 'local timezone').

By default, all 'group by time' expressions (like group by day, week, month, and year in API, also group by day, month, year, yearmonth, yearmonthday in SQL) are done using local time. For example, if you have documents with attributes timed 13:00 utc and 15:00 utc, in the case of grouping, they both will fall into facility groups according to your local timezone setting. If you live in utc, it will be one day, but if you live in utc+10, then these documents will be matched into different group by day facility groups (since 13:00 utc in UTC+10 timezone is 23:00 local time, but 15:00 is 01:00 of the next day). Sometimes such behavior is unacceptable, and it is desirable to make time grouping not dependent on timezone. You can run the server with a defined global TZ environment variable, but it will affect not only grouping but also timestamping in the logs, which may be undesirable as well. Switching 'on' this option (either in config or using SET global statement in SQL) will cause all time grouping expressions to be calculated in UTC, leaving the rest of time-depentend functions (i.e. logging of the server) in local TZ.

timezone

This setting specifies the timezone to be used by date/time-related functions. By default, the local timezone is used, but you can specify a different timezone in IANA format (e.g., Europe/Amsterdam).

Note that this setting has no impact on logging, which always operates in the local timezone.

Also, note that if grouping_in_utc is used, the 'group by time' function will still use UTC, while other date/time-related functions will use the specified timezone. Overall, it is not recommended to mix grouping_in_utc and timezone.

You can configure this option either in the config or by using the SET global statement in SQL.

ha_period_karma

This setting specifies the agent mirror statistics window size, in seconds (or special_suffixes). It is optional, with a default value of 60 seconds.

For a distributed table with agent mirrors in it (see more in agent, the master tracks several different per-mirror counters. These counters are then used for failover and balancing (the master picks the best mirror to use based on the counters). Counters are accumulated in blocks of ha_period_karma seconds.

After beginning a new block, the master may still use the accumulated values from the previous one until the new one is half full. As a result, any previous history stops affecting the mirror choice after 1.5 times ha_period_karma seconds at most.

Even though at most two blocks are used for mirror selection, up to 15 last blocks are stored for instrumentation purposes. These blocks can be inspected using the SHOW AGENT STATUS statement.

Example
ha_period_karma = 2m

ha_ping_interval

This setting configures the interval between agent mirror pings, in milliseconds (or special_suffixes). It is optional, with a default value of 1000 milliseconds.

For a distributed table with agent mirrors in it (see more in agent), the master sends all mirrors a ping command during idle periods. This is to track the current agent status (alive or dead, network roundtrip, etc). The interval between such pings is defined by this directive. To disable pings, set ha_ping_interval to 0.

Example
ha_ping_interval = 3s

hostname_lookup

The hostname_lookup option defines the strategy for renewing hostnames. By default, the IP addresses of agent host names are cached at server start to avoid excessive access to DNS. However, in some cases, the IP can change dynamically (e.g. cloud hosting) and it may be desirable to not cache the IPs. Setting this option to request disables the caching and queries the DNS for each query. The IP addresses can also be manually renewed using the FLUSH HOSTNAMES command.

jobs_queue_size

The jobs_queue_size setting defines how many "jobs" can be in the queue at the same time. It is unlimited by default.

In most cases, a "job" means one query to a single local table (plain table or a disk chunk of a real-time table). For example, if you have a distributed table consisting of 2 local tables or a real-time table with 2 disk chunks, a search query to either of them will mostly put 2 jobs in the queue. Then, the thread pool (whose size is defined by threads will process them. However, in some cases, if the query is too complex, more jobs can be created. Changing this setting is recommended when max_connections and threads are not enough to find a balance between the desired performance.

listen_backlog

The listen_backlog setting determines the length of the TCP listen backlog for incoming connections. This is particularly relevant for Windows builds that process requests one by one. When the connection queue reaches its limit, new incoming connections will be refused.
For non-Windows builds, the default value should work fine, and there is usually no need to adjust this setting.

Example
listen_backlog = 20

listen

This setting lets you specify an IP address and port, or Unix-domain socket path, that Manticore will accept connections on.

The general syntax for listen is:

listen = ( address ":" port | port | path | address ":" port start - port end ) [ ":" protocol [ "_vip" ] [ "_readonly" ] ]

You can specify:

If you specify a port number but not an address, searchd will listen on all network interfaces. Unix path is identified by a leading slash. Port range can be set only for the replication protocol.

You can also specify a protocol handler (listener) to be used for connections on this socket. The listeners are:

Adding suffix _vip to client protocols (that is, all except replication, for instance mysql_vip or http_vip or just _vip) forces creating a dedicated thread for the connection to bypass different limitations. That's useful for node maintenance in case of severe overload when the server would either stall or not let you connect via a regular port otherwise.

Suffix _readonly sets read-only mode for the listener and limits it to accept only read queries.

Example
listen = localhost
listen = localhost:5000 # listen for remote agents (binary API) and http/https requests on port 5000 at localhost
listen = 192.168.0.1:5000 # listen for remote agents (binary API) and http/https requests on port 5000 at 192.168.0.1
listen = /var/run/manticore/manticore.s # listen for binary API requests on unix socket
listen = /var/run/manticore/manticore.s:mysql # listen for mysql requests on unix socket
listen = 9312 # listen for remote agents (binary API) and http/https requests on port 9312 on any interface
listen = localhost:9306:mysql # listen for mysql requests on port 9306 at localhost
listen = localhost:9307:mysql_readonly # listen for mysql requests on port 9307 at localhost and accept only read queries
listen = 127.0.0.1:9308:http # listen for http requests as well as connections from remote agents (and binary API) on port 9308 at localhost
listen = 192.168.0.1:9320-9328:replication # listen for replication connections on ports 9320-9328 at 192.168.0.1
listen = 127.0.0.1:9443:https # listen for https requests (not http) on port 9443 at 127.0.0.1
listen = 127.0.0.1:9312:sphinx # listen for legacy Sphinx requests (e.g. from SphinxSE) on port 9312 at 127.0.0.1

There can be multiple listen directives. searchd will listen for client connections on all specified ports and sockets. The default config provided in Manticore packages defines listening on ports:

If you don't specify any listen in the configuration at all, Manticore will wait for connections on:

Listening on privileged ports

By default, Linux won't allow you to let Manticore listen on a port below 1024 (e.g. listen = 127.0.0.1:80:http or listen = 127.0.0.1:443:https) unless you run searchd under root. If you still want to be able to start Manticore, so it listens on ports < 1024 under a non-root user, consider doing one of the following (either of these should work):

Technical details about Sphinx API protocol and TFO

Legacy Sphinx protocol has 2 phases: handshake exchanging and data flow. The handshake consists of a packet of 4 bytes from the client, and a packet of 4 bytes from the daemon with only one purpose - the client determines that the remote is a real Sphinx daemon, the daemon determines that the remote is a real Sphinx client. The main dataflow is quite simple: let's both sides declare their handshakes, and the opposite check them. That exchange with short packets implies using special `TCP_NODELAY` flag, which switches off Nagle's TCP algorithm and declares that the TCP connection will be performed as a dialogue of small packages. However, it is not strictly defined who speaks first in this negotiation. Historically, all clients that use the binary API speak first: send handshake, then read 4 bytes from a daemon, then send a request and read an answer from the daemon. When we improved Sphinx protocol compatibility, we considered these things: 1. Usually, master-agent communication is established from a known client to a known host on a known port. So, it is quite not possible that the endpoint will provide a wrong handshake. So, we may implicitly assume that both sides are valid and really speak in Sphinx proto. 2. Given this assumption, we can 'glue' a handshake to the real request and send it in one packet. If the backend is a legacy Sphinx daemon, it will just read this glued packet as 4 bytes of a handshake, then request body. Since they both came in one packet, the backend socket has -1 RTT, and the frontend buffer still works despite that fact usual way. 3. Continuing the assumption: since the 'query' packet is quite small, and the handshake is even smaller, let's send both in the initial 'SYN' TCP package using modern TFO (tcp-fast-open) technique. That is: we connect to a remote node with the glued handshake + body package. The daemon accepts the connection and immediately has both the handshake and the body in a socket buffer, as they came in the very first TCP 'SYN' packet. That eliminates another one RTT. 4. Finally, teach the daemon to accept this improvement. Actually, from the application, it implies NOT to use `TCP_NODELAY`. And, from the system side, it implies to ensure that on the daemon side, accepting TFO is activated, and on the client side, sending TFO is also activated. By default, in modern systems, client TFO is already activated by default, so you only have to tune the server TFO for all things to work. All these improvements without actually changing the protocol itself allowed us to eliminate 1.5 RTT of the TCP protocol from the connection. Which is, if the query and answer are capable of being placed in a single TCP package, decreases the whole binary API session from 3.5 RTT to 2 RTT - which makes network negotiation about 2 times faster. So, all our improvements are stated around an initially undefined statement: 'who speaks first.' If a client speaks first, we may apply all these optimizations and effectively process connect + handshake + query in a single TFO package. Moreover, we can look at the beginning of the received package and determine a real protocol. That is why you can connect to one and the same port via API/http/https. If the daemon has to speak first, all these optimizations are impossible, and the multiprotocol is also impossible. That is why we have a dedicated port for MySQL and did not unify it with all the other protocols into a same port. Suddenly, among all clients, one was written implying that daemon should send a handshake first. That is - no possibility to all the described improvements. That is SphinxSE plugin for mysql/mariadb. So, specially for this single client we dedicated `sphinx` proto definition to work most legacy way. Namely: both sides activate `TCP_NODELAY` and exchange with small packages. The daemon sends its handshake on connect, then the client sends its, and then everything works usual way. That is not very optimal, but just works. If you use SphinxSE to connect to Manticore - you have to dedicate a listener with explicitly stated `sphinx` proto. For another clients - avoid to use this listener as it is slower. If you use another legacy Sphinx API clients - check first, if they are able to work with non-dedicated multiprotocol port. For master-agent linkage using the non-dedicated (multiprotocol) port and enabling client and server TFO works well and will definitely make working of network backend faster, especially if you have very light and fast queries.

listen_tfo

This setting allows the TCP_FASTOPEN flag for all listeners. By default, it is managed by the system but may be explicitly switched off by setting it to '0'.

For general knowledge about the TCP Fast Open extension, please consult with Wikipedia. In short, it allows the elimination of one TCP round-trip when establishing a connection.

In practice, using TFO in many situations may optimize client-agent network efficiency, as if persistent agents are in play, but without holding active connections, and also without limitation for the maximum num of connections.

On modern OS, TFO support is usually switched 'on' at the system level, but this is just a 'capability', not the rule. Linux (as the most progressive) has supported it since 2011, on kernels starting from 3.7 (for the server-side). Windows has supported it from some builds of Windows 10. Other operating systems (FreeBSD, MacOS) are also in the game.

For Linux system server checks variable /proc/sys/net/ipv4/tcp_fastopen and behaves according to it. Bit 0 manages client side, bit 1 rules listeners. By default, the system has this parameter set to 1, i.e., clients enabled, listeners disabled.

log

The log setting specifies the name of the log file where all searchd run time events will be logged. If not specified, the default name is 'searchd.log'.

Alternatively, you can use the 'syslog' as the file name. In this case, the events will be sent to the syslog daemon. To use the syslog option, you need to configure Manticore with the -–with-syslog option during building.

Example
log = /var/log/searchd.log

max_batch_queries

Limits the amount of queries per batch. Optional, default is 32.

Makes searchd perform a sanity check of the amount of queries submitted in a single batch when using multi-queries. Set it to 0 to skip the check.

Example
max_batch_queries = 256

max_connections

Maximum number of simultaneous client connections. Unlimited by default. That is usually noticeable only when using any kind of persistent connections, like cli mysql sessions or persistent remote connections from remote distributed tables. When the limit is exceeded you can still connect to the server using the VIP connection. VIP connections are not counted towards the limit.

Example
max_connections = 10

max_threads_per_query

Instance-wide limit of threads one operation can use. By default, appropriate operations can occupy all CPU cores, leaving no room for other operations. For example, call pq against a considerably large percolate table can utilize all threads for tens of seconds. Setting max_threads_per_query to, say, half of threads will ensure that you can run a couple of such call pq operations in parallel.

You can also set this setting as a session or a global variable during runtime.

Additionally, you can control the behavior on a per-query basis with the help of the threads OPTION.

Example
max_threads_per_query = 4

max_filters

Maximum allowed per-query filter count. This setting is only used for internal sanity checks and does not directly affect RAM usage or performance. Optional, the default is 256.

Example
max_filters = 1024

max_filter_values

Maximum allowed per-filter values count. This setting is only used for internal sanity checks and does not directly affect RAM usage or performance. Optional, the default is 4096.

Example
max_filter_values = 16384

max_open_files

The maximum number of files that the server is allowed to open is called the "soft limit". Note that serving large fragmented real-time tables may require this limit to be set high, as each disk chunk may occupy a dozen or more files. For example, a real-time table with 1000 chunks may require thousands of files to be opened simultaneously. If you encounter the error 'Too many open files' in the logs, try adjusting this option, as it may help resolve the issue.

There is also a "hard limit" that cannot be exceeded by the option. This limit is defined by the system and can be changed in the file /etc/security/limits.conf on Linux. Other operating systems may have different approaches, so consult your manuals for more information.

Example
max_open_files = 10000

Apart from direct numeric values, you can use the magic word 'max' to set the limit equal to the available current hard limit.

Example
max_open_files = max

max_packet_size

Maximum allowed network packet size. This setting limits both query packets from clients and response packets from remote agents in a distributed environment. Only used for internal sanity checks, it does not directly affect RAM usage or performance. Optional, the default is 8M.

Example
max_packet_size = 32M

mysql_version_string

A server version string to return via the MySQL protocol. Optional, the default is empty (returns the Manticore version).

Several picky MySQL client libraries depend on a particular version number format used by MySQL, and moreover, sometimes choose a different execution path based on the reported version number (rather than the indicated capabilities flags). For instance, Python MySQLdb 1.2.2 throws an exception when the version number is not in X.Y.ZZ format; MySQL .NET connector 6.3.x fails internally on version numbers 1.x along with a certain combination of flags, etc. To work around that, you can use the mysql_version_string directive and have searchd report a different version to clients connecting over the MySQL protocol. (By default, it reports its own version.)

Example
mysql_version_string = 5.0.37

net_workers

Number of network threads, the default is 1.

This setting is useful for extremely high query rates when just one thread is not enough to manage all the incoming queries.

net_wait_tm

Controls the busy loop interval of the network thread. The default is -1, and it can be set to -1, 0, or a positive integer.

In cases where the server is configured as a pure master and just routes requests to agents, it is important to handle requests without delays and not allow the network thread to sleep. There is a busy loop for that. After an incoming request, the network thread uses CPU poll for 10 * net_wait_tm milliseconds if net_wait_tm is a positive number or polls only with the CPU if net_wait_tm is 0. Also, the busy loop can be disabled with net_wait_tm = -1 - in this case, the poller sets the timeout to the actual agent's timeouts on the system polling call.

WARNING: A CPU busy loop actually loads the CPU core, so setting this value to any non-default value will cause noticeable CPU usage even with an idle server.

net_throttle_accept

Defines how many clients are accepted on each iteration of the network loop. Default is 0 (unlimited), which should be fine for most users. This is a fine-tuning option to control the throughput of the network loop in high load scenarios.

net_throttle_action

Defines how many requests are processed on each iteration of the network loop. The default is 0 (unlimited), which should be fine for most users. This is a fine-tuning option to control the throughput of the network loop in high load scenarios.

network_timeout

Network client request read/write timeout, in seconds (or special_suffixes). Optional, the default is 5 seconds. searchd will forcibly close a client connection which fails to send a query or read a result within this timeout.

Note also the reset_network_timeout_on_packet parameter. This parameter alters the behavior of network_timeout from applying to the entire query or result to individual packets instead. Typically, a query/result fits within one or two packets. However, in cases where a large amount of data is required, this parameter can be invaluable in maintaining active operations.

Example
network_timeout = 10s

node_address

This setting allows you to specify the network address of the node. By default, it is set to the replication listen ddress. This is correct in most cases; however, there are situations where you have to specify it manually:

Example
node_address = 10.101.0.10

not_terms_only_allowed

This setting determines whether to allow queries with only the negation full-text operator. Optional, the default is 0 (fail queries with only the NOT operator).

Example
not_terms_only_allowed = 1

optimize_cutoff

Sets the default table compaction threshold. Read more here - Number of optimized disk chunks. This setting can be overridden with the per-query option cutoff. It can also be changed dynamically via SET GLOBAL.

Example
optimize_cutoff = 4

persistent_connections_limit

This setting determines the maximum number of simultaneous persistent connections to remote persistent agents. Each time an agent defined under agent_persistent is connected, we try to reuse an existing connection (if any), or connect and save the connection for future use. However, in some cases, it makes sense to limit the number of such persistent connections. This directive defines the limit. It affects the number of connections to each agent's host across all distributed tables.

It is reasonable to set the value equal to or less than the max_connections option in the agent's config.

Example
persistent_connections_limit = 29 # assume that each host of agents has max_connections = 30 (or 29).

pid_file

pid_file is a mandatory configuration option in Manticore search that specifies the path of the file where the process ID of the searchd server is stored.

The searchd process ID file is re-created and locked on startup, and contains the head server process ID while the server is running. It is unlinked on server shutdown.
The purpose of this file is to enable Manticore to perform various internal tasks, such as checking whether there is already a running instance of searchd, stopping searchd, and notifying it that it should rotate the tables. The file can also be used for external automation scripts.

Example
pid_file = /var/run/manticore/searchd.pid

predicted_time_costs

Costs for the query time prediction model, in nanoseconds. Optional, the default is doc=64, hit=48, skip=2048, match=64.

Example
predicted_time_costs = doc=128, hit=96, skip=4096, match=128

Terminating queries before completion based on their execution time (with the max query time setting) is a nice safety net, but it comes with an inherent drawback: indeterministic (unstable) results. That is, if you repeat the very same (complex) search query with a time limit several times, the time limit will be hit at different stages, and you will get different result sets.

SQL
SELECT  OPTION max_query_time
API
SetMaxQueryTime()

There is a new option, SELECT … OPTION max_predicted_time, that lets you limit the query time and get stable, repeatable results. Instead of regularly checking the actual current time while evaluating the query, which is indeterministic, it predicts the current running time using a simple linear model instead:

predicted_time =
    doc_cost * processed_documents +
    hit_cost * processed_hits +
    skip_cost * skiplist_jumps +
    match_cost * found_matches

The query is then terminated early when the predicted_time reaches a given limit.

Of course, this is not a hard limit on the actual time spent (it is, however, a hard limit on the amount of processing work done), and a simple linear model is in no way an ideally precise one. So the wall clock time may be either below or over the target limit. However, the error margins are quite acceptable: for instance, in our experiments with a 100 msec target limit, the majority of the test queries fell into a 95 to 105 msec range, and all the queries were in an 80 to 120 msec range. Also, as a nice side effect, using the modeled query time instead of measuring the actual run time results in somewhat fewer gettimeofday() calls, too.

No two server makes and models are identical, so the predicted_time_costs directive lets you configure the costs for the model above. For convenience, they are integers, counted in nanoseconds. (The limit in max_predicted_time is counted in milliseconds, and having to specify cost values as 0.000128 ms instead of 128 ns is somewhat more error-prone.) It is not necessary to specify all four costs at once, as the missed ones will take the default values. However, we strongly suggest specifying all of them for readability.

preopen_tables

The preopen_tables configuration directive specifies whether to forcibly preopen all tables on startup. The default value is 1, which means that all tables will be preopened regardless of the per-table preopen setting. If set to 0, the per-table settings can take effect, and they will default to 0.

Pre-opening tables can prevent races between search queries and rotations that can cause queries to fail occasionally. However, it also uses more file handles. In most scenarios, it is recommended to preopen tables.

Here's an example configuration:

Example
preopen_tables = 1

pseudo_sharding

The pseudo_sharding configuration option enables parallelization of search queries to local plain and real-time tables, regardless of whether they are queried directly or through a distributed table. This feature will automatically parallelize queries to up to the number of threads specified in searchd.threads # of threads.

Note that if your worker threads are already busy, because you have:

then enabling pseudo_sharding may not provide any benefits and may even result in a slight decrease in throughput. If you prioritize higher throughput over lower latency, it's recommended to disable this option.

Enabled by default.

Example
pseudo_sharding = 0

replication_connect_timeout

The replication_connect_timeout directive defines the timeout for connecting to a remote node. By default, the value is assumed to be in milliseconds, but it can have another suffix. The default value is 1000 (1 second).

When connecting to a remote node, Manticore will wait for this amount of time at most to complete the connection successfully. If the timeout is reached but the connection has not been established, and retries are enabled, a retry will be initiated.

replication_query_timeout

The replication_query_timeout sets the amount of time that searchd will wait for a remote node to complete a query. The default value is 3000 milliseconds (3 seconds), but can be suffixed to indicate a different unit of time.

After establishing a connection, Manticore will wait for a maximum of replication_query_timeout for the remote node to complete. Note that this timeout is separate from the replication_connect_timeout, and the total possible delay caused by a remote node will be the sum of both values.

replication_retry_count

This setting is an integer that specifies how many times Manticore will attempt to connect and query a remote node during replication before reporting a fatal query error. The default value is 3.

replication_retry_delay

This setting is an integer in milliseconds (or special_suffixes) that specifies the delay before Manticore retries querying a remote node in case of failure during replication. This value is only relevant when a non-zero value is specified. The default value is 500.

qcache_max_bytes

This configuration sets the maximum amount of RAM allocated for cached result sets in bytes. The default value is 16777216, which is equivalent to 16 megabytes. If the value is set to 0, the query cache is disabled. For more information about the query cache, please refer to the query cache for details.

Example
qcache_max_bytes = 16777216

qcache_thresh_msec

Integer, in milliseconds. The minimum wall time threshold for a query result to be cached. Defaults to 3000, or 3 seconds. 0 means cache everything. Refer to query cache for details. This value also may be expressed with time special_suffixes, but use it with care and don't confuse yourself with the name of the value itself, containing '_msec'.

qcache_ttl_sec

Integer, in seconds. The expiration period for a cached result set. Defaults to 60, or 1 minute. The minimum possible value is 1 second. Refer to query cache for details. This value also may be expressed with time special_suffixes, but use it with care and don't confuse yourself with the name of the value itself, containing '_sec'.

query_log_format

Query log format. Optional, allowed values are plain and sphinxql, default is sphinxql.

The sphinxql mode logs valid SQL statements. The plain mode logs queries in a plain text format (mostly suitable for purely full-text use cases). This directive allows you to switch between the two formats on search server startup. The log format can also be altered on the fly, using SET GLOBAL query_log_format=sphinxql syntax. Refer to Query logging for more details.

Example
query_log_format = sphinxql

query_log_min_msec

Limit (in milliseconds) that prevents the query from being written to the query log. Optional, default is 0 (all queries are written to the query log). This directive specifies that only queries with execution times that exceed the specified limit will be logged (this value also may be expressed with time special_suffixes, but use it with care and don't confuse yourself with the name of the value itself, containing _msec).

query_log

Query log file name. Optional, default is empty (do not log queries). All search queries (such as SELECT ... but not INSERT/REPLACE/UPDATE queries) will be logged in this file. The format is described in Query logging. In case of 'plain' format, you can use 'syslog' as the path to the log file. In this case, all search queries will be sent to the syslog daemon with LOG_INFO priority, prefixed with '[query]' instead of timestamp. To use the syslog option, Manticore must be configured with -–with-syslog on building.

Example
query_log = /var/log/query.log

query_log_mode

The query_log_mode directive allows you to set a different permission for the searchd and query log files. By default, these log files are created with 600 permission, meaning that only the user under which the server runs and root users can read the log files.
This directive can be handy if you want to allow other users to read the log files, for example, monitoring solutions running on non-root users.

Example
query_log_mode  = 666

read_buffer_docs

The read_buffer_docs directive controls the per-keyword read buffer size for document lists. For every keyword occurrence in every search query, there are two associated read buffers: one for the document list and one for the hit list. This setting lets you control the document list buffer size.

A larger buffer size might increase per-query RAM use, but it could possibly decrease I/O time. It makes sense to set larger values for slow storage, but for storage capable of high IOPS, experimenting should be done in the low values area.

The default value is 256K, and the minimal value is 8K. You may also set read_buffer_docs on a per-table basis, which will override anything set on the server's config level.

Example
read_buffer_docs = 128K

read_buffer_hits

The read_buffer_hits directive specifies the per-keyword read buffer size for hit lists in search queries. By default, the size is 256K and the minimum value is 8K. For every keyword occurrence in a search query, there are two associated read buffers, one for the document list and one for the hit list. Increasing the buffer size can increase per-query RAM use but decrease I/O time. For slow storage, larger buffer sizes make sense, while for storage capable of high IOPS, experimenting should be done in the low values area.

This setting can also be specified on a per-table basis using the read_buffer_hits option in read_buffer_hits which will override the server-level setting.

Example
read_buffer_hits = 128K

read_unhinted

Unhinted read size. Optional, default is 32K, minimal 1K

When querying, some reads know in advance exactly how much data is there to be read, but some currently do not. Most prominently, hit list size is not currently known in advance. This setting lets you control how much data to read in such cases. It impacts hit list I/O time, reducing it for lists larger than unhinted read size, but raising it for smaller lists. It does not affect RAM usage because the read buffer will already be allocated. So it should not be greater than read_buffer.

Example
read_unhinted = 32K

reset_network_timeout_on_packet

Refines the behavior of networking timeouts (such as network_timeout, read_timeout, and agent_query_timeout).

When set to 0, timeouts limit the maximum time for sending the entire request/query.
When set to 1 (default), timeouts limit the maximum time between network activities.

With replication, a node may need to send a large file (for example, 100GB) to another node. Assume the network can transfer data at 1GB/s, with a series of packets of 4-5MB each. To transfer the entire file, you would need 100 seconds. A default timeout of 5 seconds would only allow the transfer of 5GB before the connection is dropped. Increasing the timeout could be a workaround, but it is not scalable (for instance, the next file might be 150GB, leading to failure again). However, with the default reset_network_timeout_on_packet set to 1, the timeout is applied not to the entire transfer but to individual packets. As long as the transfer is in progress (and data is actually being received over the network during the timeout period), it is kept alive. If the transfer gets stuck, such that a timeout occurs between packets, it will be dropped.

Note that if you set up a distributed table, each node — both master and agents — should be tuned. On the master side, agent_query_timeout is affected; on agents, network_timeout is relevant.

Example
reset_network_timeout_on_packet = 0

rt_flush_period

RT tables RAM chunk flush check period, in seconds (or special_suffixes). Optional, default is 10 hours.

Actively updated RT tables that fully fit in RAM chunks can still result in ever-growing binlogs, impacting disk use and crash recovery time. With this directive, the search server performs periodic flush checks, and eligible RAM chunks can be saved, enabling consequential binlog cleanup. See Binary logging for more details.

Example
rt_flush_period = 3600 # 1 hour

rt_merge_iops

A maximum number of I/O operations (per second) that the RT chunks merge thread is allowed to start. Optional, default is 0 (no limit).

This directive lets you throttle down the I/O impact arising from the OPTIMIZE statements. It is guaranteed that all RT optimization activities will not generate more disk IOPS (I/Os per second) than the configured limit. Limiting rt_merge_iops can reduce search performance degradation caused by merging.

Example
rt_merge_iops = 40

rt_merge_maxiosize

A maximum size of an I/O operation that the RT chunks merge thread is allowed to start. Optional, default is 0 (no limit).

This directive lets you throttle down the I/O impact arising from the OPTIMIZE statements. I/Os larger than this limit will be broken down into two or more I/Os, which will then be accounted for as separate I/Os with regards to the rt_merge_iops limit. Thus, it is guaranteed that all optimization activities will not generate more than (rt_merge_iops * rt_merge_maxiosize) bytes of disk I/O per second.

Example
rt_merge_maxiosize = 1M

seamless_rotate

Prevents searchd stalls while rotating tables with huge amounts of data to precache. Optional, default is 1 (enable seamless rotation). On Windows systems, seamless rotation is disabled by default.

Tables may contain some data that needs to be precached in RAM. At the moment, .spa, .spb, .spi, and .spm files are fully precached (they contain attribute data, blob attribute data, keyword table, and killed row map, respectively.) Without seamless rotate, rotating a table tries to use as little RAM as possible and works as follows:

  1. New queries are temporarily rejected (with "retry" error code);
  2. searchd waits for all currently running queries to finish;
  3. The old table is deallocated, and its files are renamed;
  4. New table files are renamed, and required RAM is allocated;
  5. New table attribute and dictionary data are preloaded to RAM;
  6. searchd resumes serving queries from the new table.

However, if there's a lot of attribute or dictionary data, then the preloading step could take a noticeable amount of time - up to several minutes in the case of preloading 1-5+ GB files.

With seamless rotate enabled, rotation works as follows:

  1. New table RAM storage is allocated;
  2. New table attribute and dictionary data are asynchronously preloaded to RAM;
  3. On success, the old table is deallocated, and both tables' files are renamed;
  4. On failure, the new table is deallocated;
  5. At any given moment, queries are served either from the old or new table copy.

Seamless rotate comes at the cost of higher peak memory usage during the rotation (because both old and new copies of .spa/.spb/.spi/.spm data need to be in RAM while preloading the new copy). Average usage remains the same.

Example
seamless_rotate = 1

secondary_indexes

This option enables/disables the use of secondary indexes for search queries. It is optional, and the default is 1 (enabled). Note that you don't need to enable it for indexing as it is always enabled as long as the Manticore Columnar Library is installed. The latter is also required for using the indexes when searching. There are three modes available:

Note that secondary indexes are not effective for full-text queries.

Example
secondary_indexes = 1

server_id

Integer number that serves as a server identifier used as a seed to generate a unique short UUID for nodes that are part of a replication cluster. The server_id must be unique across the nodes of a cluster and in the range from 0 to 127. If server_id is not set, the MAC address or a random number will be used as a seed for the short UUID.

Example
server_id = 1

shutdown_timeout

searchd --stopwait waiting time, in seconds (or special_suffixes). Optional, default is 60 seconds.

When you run searchd --stopwait your server needs to perform some activities before stopping, such as finishing queries, flushing RT RAM chunks, flushing attributes, and updating the binlog. These tasks require some time. searchd --stopwait will wait up to shutdown_time seconds for the server to finish its jobs. The suitable time depends on your table size and load.

Example
shutdown_timeout = 3m # wait for up to 3 minutes

shutdown_token

SHA1 hash of the password required to invoke the 'shutdown' command from a VIP Manticore SQL connection. Without it,debug 'shutdown' subcommand will never cause the server to stop. Note that such simple hashing should not be considered strong protection, as we don't use a salted hash or any kind of modern hash function. It is intended as a fool-proof measure for housekeeping daemons in a local network.

snippets_file_prefix

A prefix to prepend to the local file names when generating snippets. Optional, default is the current working folder.

This prefix can be used in distributed snippets generation along with load_files or load_files_scattered options.

Note that this is a prefix and not a path! This means that if a prefix is set to "server1" and the request refers to "file23", searchd will attempt to open "server1file23" (all of that without quotes). So, if you need it to be a path, you have to include the trailing slash.

After constructing the final file path, the server unwinds all relative dirs and compares the final result with the value of snippet_file_prefix. If the result does not begin with the prefix, such a file will be rejected with an error message.

For example, if you set it to /mnt/data and someone calls snippet generation with the file ../etc/passwd as the source, they will get the error message:

File '/mnt/data/../etc/passwd' escapes '/mnt/data/' scope

instead of the content of the file.

Also, with a non-set parameter and reading /etc/passwd, it will actually read /daemon/working/folder/etc/passwd since the default for the parameter is the server's working folder.

Note also that this is a local option; it does not affect the agents in any way. So you can safely set a prefix on a master server. The requests routed to the agents will not be affected by the master's setting. They will, however, be affected by the agent's own settings.

This might be useful, for instance, when the document storage locations (whether local storage or NAS mountpoints) are inconsistent across the servers.

Example
snippets_file_prefix = /mnt/common/server1/

WARNING: If you still want to access files from the FS root, you have to explicitly set snippets_file_prefix to empty value (by snippets_file_prefix= line), or to root (by snippets_file_prefix=/).

sphinxql_state

Path to a file where the current SQL state will be serialized.

On server startup, this file gets replayed. On eligible state changes (e.g., SET GLOBAL), this file gets rewritten automatically. This can prevent a hard-to-diagnose problem: If you load UDF functions but Manticore crashes, when it gets (automatically) restarted, your UDF and global variables will no longer be available. Using persistent state helps ensure a graceful recovery with no such surprises.

sphinxql_state cannot be used to execute arbitrary commands, such as CREATE TABLE.

Example
sphinxql_state = uservars.sql

sphinxql_timeout

Maximum time to wait between requests (in seconds, or special_suffixes) when using the SQL interface. Optional, default is 15 minutes.

Example
sphinxql_timeout = 15m

ssl_ca

Path to the SSL Certificate Authority (CA) certificate file (also known as root certificate). Optional, default is empty. When not empty, the certificate in ssl_cert should be signed by this root certificate.

The server uses the CA file to verify the signature on the certificate. The file must be in PEM format.

Example
ssl_ca = keys/ca-cert.pem

ssl_cert

Path to the server's SSL certificate. Optional, default is empty.

The server uses this certificate as a self-signed public key to encrypt HTTP traffic over SSL. The file must be in PEM format.

Example
ssl_cert = keys/server-cert.pem

ssl_key

Path to the SSL certificate key. Optional, default is empty.

The server uses this private key to encrypt HTTP traffic over SSL. The file must be in PEM format.

Example
ssl_key = keys/server-key.pem

subtree_docs_cache

Max common subtree document cache size, per-query. Optional, default is 0 (disabled).

This setting limits the RAM usage of a common subtree optimizer (see multi-queries). At most, this much RAM will be spent to cache document entries for each query. Setting the limit to 0 disables the optimizer.

Example
subtree_docs_cache = 8M

subtree_hits_cache

Max common subtree hit cache size, per-query. Optional, default is 0 (disabled).

This setting limits the RAM usage of a common subtree optimizer (see multi-queries). At most, this much RAM will be spent to cache keyword occurrences (hits) for each query. Setting the limit to 0 disables the optimizer.

Example
subtree_hits_cache = 16M

threads

Number of working threads (or, size of thread pool) for the Manticore daemon. Manticore creates this number of OS threads on start, and they perform all jobs inside the daemon, such as executing queries, creating snippets, etc. Some operations may be split into sub-tasks and executed in parallel, for example:

By default, it's set to the number of CPU cores on the server. Manticore creates the threads on start and keeps them until it's stopped. Each sub-task can use one of the threads when it needs it. When the sub-task finishes, it releases the thread so another sub-task can use it.

In the case of intensive I/O type of load, it might make sense to set the value higher than the number of CPU cores.

Example
threads = 10

thread_stack

Maximum stack size for a job (coroutine, one search query may cause multiple jobs/coroutines). Optional, default is 128K.

Each job has its own stack of 128K. When you run a query, it's checked for how much stack it requires. If the default 128K is enough, it's just processed. If it needs more, another job with an increased stack is scheduled, which continues processing. The maximum size of such an advanced stack is limited by this setting.

Setting the value to a reasonably high rate will help with processing very deep queries without implying that overall RAM consumption will grow too high. For example, setting it to 1G does not imply that every new job will take 1G of RAM, but if we see that it requires, let's say, 100M stack, we just allocate 100M for the job. Other jobs at the same time will be running with their default 128K stack. The same way, we can run even more complex queries that need 500M. And only if we see internally that the job requires more than 1G of stack, we will fail and report about too low thread_stack.

However, in practice, even a query which needs 16M of stack is often too complex for parsing and consumes too much time and resources to be processed. So, the daemon will process it, but limiting such queries by the thread_stack setting looks quite reasonable.

Example
thread_stack = 8M

Determines whether to unlink .old table copies on successful rotation. Optional, default is 1 (do unlink).

Example
unlink_old = 0

watchdog

Threaded server watchdog. Optional, default is 1 (watchdog enabled).

When a Manticore query crashes, it can take down the entire server. With the watchdog feature enabled, searchd also maintains a separate lightweight process that monitors the main server process and automatically restarts it in case of abnormal termination. The watchdog is enabled by default.

Example
watchdog = 0 # disable watchdog
¶ 20.2

Section "Common" in configuration

lemmatizer_base

The lemmatizer_base is an optional configuration directive that specifies the base path for lemmatizer dictionaries. The default path is /usr/share/manticore

The lemmatizer implementation in Manticore Search (see Morphology to learn what lemmatizers are) is dictionary-driven and requires specific dictionary files for different languages. These files can be downloaded from the Manticore website (https://manticoresearch.com/install/#other-downloads).

Example:

lemmatizer_base = /usr/share/manticore/

progressive_merge

The progressive_merge is a configuration directive that, when enabled, merges real-time table disk chunks from smaller to larger ones. This approach speeds up the merging process and reduces read/write amplification. By default, this setting is enabled. If disabled, the chunks are merged in the order they were created.

json_autoconv_keynames

The json_autoconv_keynames is an optional configuration directive that determines if and how to auto-convert key names within JSON attributes. The known value is 'lowercase'. By default, this setting is unspecified (meaning no conversion occurs).

When set to lowercase, key names within JSON attributes will be automatically converted to lowercase during indexing. This conversion applies to JSON attributes from all data sources, including SQL and XMLpipe2.

Example:

json_autoconv_keynames = lowercase

json_autoconv_numbers

The json_autoconv_numbers is an optional configuration directive that determines whether to automatically detect and convert JSON strings that represent numbers into numeric attributes. The default value is 0 (do not convert strings into numbers).

When this option is set to 1, values such as "1234" will be indexed as numbers instead of strings. If the option is set to 0, such values will be indexed as strings. This conversion applies to JSON attributes from all data sources, including SQL and XMLpipe2.

Example:

json_autoconv_numbers = 1

on_json_attr_error

on_json_attr_error is an optional configuration directive that specifies the action to take if JSON format errors are found. The default value is ignore_attr(ignore errors). This setting applies only to sql_attr_json attributes.

By default, JSON format errors are ignored (ignore_attr), and the indexer tool will show a warning. Setting this option to fail_index will cause indexing to fail at the first JSON format error.

Example:

on_json_attr_error = ignore_attr

plugin_dir

The plugin_dir is an optional configuration directive that specifies the trusted location for dynamic libraries (UDFs). The default path is /usr/local/lib/manticore/.

This directive sets the trusted directory from which the UDF libraries can be loaded.

Example:

plugin_dir = /usr/local/lib/manticore/
¶ 20.3

Special suffixes

Manticore Search supports the use of special suffixes to simplify numeric values with specific meanings. These suffixes are categorized into size suffixes and time suffixes. The common format for suffixes is an integer followed by a literal, such as 10k or 100d. Literals are case-insensitive, so 10W and 10w are considered the same.

¶ 20.4

Scripted configuration

Manticore configuration supports shebang syntax, allowing the configuration to be written in a programming language and interpreted at loading. This enables dynamic settings, such as generating tables by querying a database table, modifying settings based on external factors, or including external files containing table and source declarations.

The configuration file is parsed by the declared interpreter, and the output is used as the actual configuration. This occurs each time the configuration is read, not only at searchd startup.

Note: This feature is not available on the Windows platform.

In the following example, PHP is used to create multiple tables with different names and to scan a specific folder for files containing extra table declarations:

#!/usr/bin/php
...
<?php for ($i=1; $i<=6; $i++) { ?>
table test_<?=$i?> {
  type = rt
  path = /var/lib/manticore/data/test_<?=$i?>
  rt_field = subject
  ...
 }
 <?php } ?>
 ...

 <?php
 $confd_folder='/etc/manticore.conf.d/';
 $files = scandir($confd_folder);
 foreach($files as $file)
 {
         if(($file == '.') || ($file =='..'))
         {} else {
                 $fp = new SplFileInfo($confd_folder.$file);
                 if('conf' == $fp->getExtension()){
                         include ($confd_folder.$file);
                 }
         }
 }
 ?>
¶ 20.5

Comments

Manticore Search's configuration file supports comments, which help provide explanations or notes within the configuration file. The # character is used to start a comment section. You can place the comment character either at the beginning of a line or inline within a line.

When using comments, be cautious when incorporating the # character in character tokenization settings, as everything following it will be ignored. To prevent this, use the # UTF-8 code, which is U+23.

If you need to use the # character within your configuration file, such as within database credentials in source declarations, you can escape it using a backslash \. This allows you to include the # character in your settings without it being interpreted as the start of a comment.

¶ 20.6

Inheritance of index and source declarations

Inheritance in index and source declarations enables better organization of tables with similar settings or structures and reduces the configuration size. Both parent and child tables or sources can utilize inheritance.

No specific configurations are needed for a parent table or source.

In the child table or source declaration, specify the table or source name followed by a colon (:) and the parent name:

table parent {
path = /var/lib/manticore/parent
...
}

table child:parent {
path = /var/lib/manticore/child
...
}

The child will inherit the entire configuration of the parent. Any settings declared in the child will overwrite the inherited values. Be aware that for multi-value settings, defining a single value in the child will clear all inherited values. For example, if the parent has several sql_query_pre declarations and the child has a single sql_query_pre declaration, all inherited sql_query_pre declarations are cleared. To override some of the inherited values from the parent, explicitly declare them in the child. This is also applicable if you don't need a value from the parent. For example, if the sql_query_pre value from the parent is not needed, declare the directive with an empty value in the child like sql_query_pre=.

Note that existing values of a multi-value setting will not be copied if the child declares one value for that setting.

The inheritance behavior applies to fields and attributes, not just table options. For example, if the parent has two integer attributes and the child needs a new integer attribute, the integer attribute declarations from the parent must be copied into the child configuration.

¶ 20.7

Setting variables online

SET

SET [GLOBAL] server_variable_name = value
SET [INDEX index_name] GLOBAL @user_variable_name = (int_val1 [, int_val2, ...])
SET NAMES value [COLLATE value]
SET @@dummy_variable = ignored_value

The SET statement in Manticore Search allows you to modify variable values. Variable names are case-insensitive, and no variable value changes will persist after a server restart.

Manticore Search supports the SET NAMES statement and SET @@variable_name syntax for compatibility with third-party MySQL client libraries, connectors, and frameworks that may require running these statements when connecting. However, these statements do not have any effect on Manticore Search itself.

There are four classes of variables in Manticore Search:

  1. Per-session server variable: set var_name = value
  2. Global server variable: set global var_name = value
  3. Global user variable: set global @var_name = (value)
  4. Global distributed variable: set index dist_index_name global @var_name = (value)

Global user variables are shared between concurrent sessions. The only supported value type is a list of BIGINTs, and these variables can be used with the IN() operator for filtering purposes. The primary use case for this feature is to upload large lists of values to searchd once and reuse them multiple times later, reducing network overhead. Global user variables can be transferred to all agents of a distributed table or set locally in the case of a local table defined in a distributed table. Example:

// in session 1
mysql> SET GLOBAL @myfilter=(2,3,5,7,11,13);
Query OK, 0 rows affected (0.00 sec)

// later in session 2
mysql> SELECT * FROM test1 WHERE group_id IN @myfilter;
+------+--------+----------+------------+-----------------+------+
| id   | weight | group_id | date_added | title           | tag  |
+------+--------+----------+------------+-----------------+------+
|    3 |      1 |        2 | 1299338153 | another doc     | 15   |
|    4 |      1 |        2 | 1299338153 | doc number four | 7,40 |
+------+--------+----------+------------+-----------------+------+
2 rows in set (0.02 sec)

Manticore Search supports per-session and global server variables that affect specific server settings in their respective scopes. Below is a list of known per-session and global server variables:

Known per-session server variables:

By default, Manticore starts a thread pool with a calculated number of CPU cores. All typical tasks are then distributed into this thread pool to ensure maximum CPU utilization. For instance, when a real-time table has several disk chunks, the search will be parallelized over the CPU cores. Similarly, for a single full-text search over a single index, the daemon will attempt to optimize search execution in parallel, using a technique referred to as "pseudo-sharding". Both features heavily depend on the total number of CPU cores and the number of free cores available for immediate use. This approach enhances performance but can make incident investigation more challenging. For example, a query doing `COUNT(*)` may return an approximate result (e.g., greater than 100 matches), and a subsequent execution of the same query may yield an exact result (e.g., exactly 120 matches). This variability depends on the available cores, but since this factor is unpredictable, it generally leads to non-reproducible results. Although this is usually acceptable, it can sometimes pose a problem. The `threads_ex` option specifies a desired CPU cores configuration, making queries with this configuration reproducible. `threads_ex` sets the CPU template for standard tasks and for pseudo-sharding, as pseudo-sharding can be part of the standard parallelization process. For example, if there are several disk chunks, they will be queried in parallel, but each may be further parallelized using pseudo-sharding. Thus, to manage this situation effectively, you need a couple of templates for each task type. A template is a string like `10/3`, where 10 represents concurrency and 3 represents batch size. If concurrency is 0, the default concurrency will be used. If batch size is 0, the default trivial template will be used. Any zero value can be omitted or replaced with `*`. The default (trivial) template can be described as `''`, and also as `*/*`, `0/0`, `0/`, `*/`, `/0`, `*`, etc. This means the daemon uses all available CPU cores without special batching limitations. A trivial template with 20 threads can be expressed as `20/*`, `20/0`, `20/`, or simply `20`. A round-robin template with a batch size of 2 is `*/2`, `0/2`, or simply `/2`. A round-robin dispatcher with 20 threads and a batch size of 3 is `20/3`. `threads_ex` is a template for basic tasks and for pseudo-sharding, separated by `+`, like: * `30+3` - a trivial base of 30 threads + trivial pseudo-sharding of 3 threads * `+/2` - a trivial base + round-robin pseudo-sharding with default threads and batch=2 * `10` - a trivial base of 10 threads + default trivial pseudo-sharding * `/1+10` - a round-robin base with default threads and batch=1 + trivial pseudo-sharding with 10 threads * `4/2+2/1` - a round-robin base with 4 threads and batch=2 + round-robin pseudo-sharding with 2 threads and batch=1 * `1+1` - the most deterministic case. Exactly 1 thread + 1 pseudo-shard, i.e., no parallelization at all. With this setting, you can reproducibly repeat the same problematic query and investigate behavior details. Furthermore, if you set `throttling_period=0`, your query will 'stick' to the current thread and never be rescheduled during execution, creating an ideal environment for troubleshooting. The option can be set globally from outside as an environment variable `MANTICORE_THREADS_EX`, like: ```bash export MANTICORE_THREADS_EX=8 export MANTICORE_THREADS_EX='16+8/2' ``` Or, via the MySQL CLI, as: ```sql SET threads_ex='16'; SET GLOBAL threads_ex='/2'; ``` Or, as a query parameter, like: ```sql SELECT ... OPTION threads_ex='1+1'; ``` The `threads_ex` configuration follows a hierarchy: environment variables first, then the global variable, and lastly, query options, allowing specific settings to override general ones.

Known global server variables are:

Examples:

mysql> SET autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL query_log_format=sphinxql;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL @banned=(1,2,3);
Query OK, 0 rows affected (0.01 sec)

mysql> SET INDEX users GLOBAL @banned=(1,2,3);
Query OK, 0 rows affected (0.01 sec)

To make user variables persistent, make sure sphinxql_state is enabled.

¶ 21

Integration

¶ 21.1

Integration with Logstash

Logstash is a log management tool that collects data from a variety of sources, transforms it on the fly, and sends it to your desired destination. It is often used as a data pipeline for Elasticsearch, an open-source analytics and search engine.

Now, Manticore supports the use of Logstash as a processing pipeline. This allows the collected and transformed data to be sent to Manticore just like to Elasticsearch. Currently, all the versions >= 7.6 are supported.

Let’s examine a simple example of a Logstash config file used for indexing dpkg.log, a standard log file of the Debian package manager. The log itself has a simple structure, as shown below:

2023-05-31 10:42:55 status triggers-awaited ca-certificates-java:all 20190405ubuntu1.1
2023-05-31 10:42:55 trigproc libc-bin:amd64 2.31-0ubuntu9.9 <none>
2023-05-31 10:42:55 status half-configured libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 status installed libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 trigproc systemd:amd64 245.4-4ubuntu3.21 <none>

Logstash configuration

Here is an example Logstash configuration:

input {
  file {
    path => ["/var/log/dpkg.log"]
    start_position => "beginning"
    sincedb_path => "/dev/null"
    mode => "read"
    exit_after_read => "true"
   file_completed_action => "log"
   file_completed_log_path => "/dev/null"
  }
}

output {
  elasticsearch {
   index => " dpkg_log"
   hosts => ["http://localhost:9308"]
   ilm_enabled => false
   manage_template => false
  }
}

Note that, before proceeding further, one crucial caveat needs to be addressed: Manticore does not support Log Template Management and the Index Lifecycle Management features of Elasticsearch. As these features are enabled by default in Logstash, they need to be explicitly disabled in the config. Additionally, the hosts option in the output config section must correspond to Manticore’s HTTP listen port (default is localhost:9308).

Logstash results

After adjusting the config as described, you can run Logstash, and the data from the dpkg log will be passed to Manticore and properly indexed.

Here is the resulting schema of the created table and an example of the inserted document:

mysql> DESCRIBE dpkg_log;
+------------------+--------+---------------------+
| Field            | Type   | Properties          |
+------------------+--------+---------------------+
| id               | bigint |                     |
| message          | text   | indexed stored      |
| @version         | text   | indexed stored      |
| @timestamp       | text   | indexed stored      |
| path             | text   | indexed stored      |
| host             | text   | indexed stored      |
+------------------+--------+---------------------+
mysql> SELECT * FROM dpkg_log LIMIT 1\G


*************************** 1. row ***************************
id: 7280000849080746110
host: logstash-db848f65f-lnlf9
message: 2023-04-12 02:03:21 status unpacked libc-bin:amd64 2.31-0ubuntu9
path: /var/log/dpkg.log
@timestamp: 2023-06-16T09:23:57.405Z
@version: 1
¶ 21.2

Integration with Filebeat

Filebeat is a lightweight shipper for forwarding and centralizing log data. Once installed as an agent, it monitors the log files or locations you specify, collects log events, and forwards them for indexing, usually to Elasticsearch or Logstash.

Now, Manticore also supports the use of Filebeat as processing pipelines. This allows the collected and transformed data to be sent to Manticore just like to Elasticsearch. Currently, all the versions >= 7.10 are supported.

Filebeat configuration

Below is a Filebeat config to work with our example dpkg log:

filebeat.inputs:
- type: filestream
  id: example
  paths:
    - /var/log/dpkg.log

output.elasticsearch:
  hosts: ["http://localhost:9308"]
  index:  "dpkg_log"
  allow_older_versions: true

setup.ilm:
  enabled: false

setup.template:
  name: "dpkg_log"
  pattern: "dpkg_log"

Configuration for Filebeat versions >= 8.11

Note that Filebeat versions higher than 8.10 have the output compression feature enabled by default. That is why the compression_level: 0 option must be added to the configuration file to provide compatibility with Manticore:

filebeat.inputs:
- type: filestream
  id: example
  paths:
    - /var/log/dpkg.log

output.elasticsearch:
  hosts: ["http://localhost:9308"]
  index:  "dpkg_log"
  allow_older_versions: true
  compression_level: 0

setup.ilm:
  enabled: false

setup.template:
  name: "dpkg_log"
  pattern: "dpkg_log"

Filebeat results

Once you run Filebeat with this configuration, log data will be sent to Manticore and properly indexed. Here is the resulting schema of the table created by Manticore and an example of the inserted document:

mysql> DESCRIBE dpkg_log;
+------------------+--------+--------------------+
| Field            | Type   | Properties         |
+------------------+--------+--------------------+
| id               | bigint |                    |
| @timestamp       | text   | indexed stored     |
| message          | text   | indexed stored     |
| log              | json   |                    |
| input            | json   |                    |
| ecs              | json   |                    |
| host             | json   |                    |
| agent            | json   |                    |
+------------------+--------+--------------------+
mysql> SELECT * FROM dpkg_log LIMIT 1\G

*************************** 1. row ***************************
id: 7280000849080753116
@timestamp: 2023-06-16T09:27:38.792Z
message: 2023-04-12 02:06:08 status half-installed libhogweed5:amd64 3.5.1+really3.5.1-2
input: {"type":"filestream"}
ecs: {"version":"1.6.0"}
host: {"name":"logstash-db848f65f-lnlf9"}
agent: {"ephemeral_id":"587c2ebc-e7e2-4e27-b772-19c611115996","id":"2e3d985b-3610-4b8b-aa3b-2e45804edd2c","name":"logstash-db848f65f-lnlf9","type":"filebeat","version":"7.10.0","hostname":"logstash-db848f65f-lnlf9"}
log: {"offset":80,"file":{"path":"/var/log/dpkg.log"}}
¶ 22

Extensions

¶ 22.1

SphinxSE

SphinxSE is a MySQL storage engine that can be compiled into MySQL/MariaDB servers using their pluggable architecture.

Despite its name, SphinxSE does not actually store any data itself. Instead, it serves as a built-in client that enables the MySQL server to communicate with searchd, execute search queries, and retrieve search results. All indexing and searching take place outside MySQL.

Some common SphinxSE applications include:

Installing SphinxSE

You will need to obtain a copy of MySQL sources, prepare those, and then recompile MySQL binary. MySQL sources (mysql-5.x.yy.tar.gz) could be obtained from http://dev.mysql.com website.

Compiling MySQL 5.0.x with SphinxSE

  1. copy sphinx.5.0.yy.diff patch file into MySQL sources directory and run
$ patch -p1 < sphinx.5.0.yy.diff

If there's no .diff file exactly for the specific version you need to: build, try applying .diff with closest version numbers. It is important that the patch should apply with no rejects.
2. in MySQL sources directory, run

$ sh BUILD/autorun.sh
  1. in MySQL sources directory, create sql/sphinx directory in and copy all files in mysqlse directory from Manticore sources there. Example:
$ cp -R /root/builds/sphinx-0.9.7/mysqlse /root/builds/mysql-5.0.24/sql/sphinx
  1. configure MySQL and enable the new engine:
$ ./configure --with-sphinx-storage-engine
  1. build and install MySQL:
$ make
$ make install

Compiling MySQL 5.1.x with SphinxSE

  1. In the MySQL sources directory, create a storage/sphinx directory and copy all files from the mysqlse directory in the Manticore sources to this new location. For example:
$ cp -R /root/builds/sphinx-0.9.7/mysqlse /root/builds/mysql-5.1.14/storage/sphinx
  1. In the MySQL source directory, run:
$ sh BUILD/autorun.sh
  1. Configure MySQL and enable the Manticore engine:
$ ./configure --with-plugins=sphinx
  1. Build and install MySQL:
$ make
$ make install

Checking SphinxSE installation

To verify that SphinxSE has been successfully compiled into MySQL, start the newly built server, run the MySQL client, and issue the SHOW ENGINES query. You should see a list of all available engines. Manticore should be present, and the "Support" column should display "YES":

Req
mysql> show engines;
+------------+----------+-------------------------------------------------------------+
| Engine     | Support  | Comment                                                     |
+------------+----------+-------------------------------------------------------------+
| MyISAM     | DEFAULT  | Default engine as of MySQL 3.23 with great performance      |
  ...
| SPHINX     | YES      | Manticore storage engine                                       |
  ...
+------------+----------+-------------------------------------------------------------+
13 rows in set (0.00 sec)

Using SphinxSE

To search using SphinxSE, you'll need to create a special ENGINE=SPHINX "search table" and then use a SELECT statement with the full-text query placed in the WHERE clause for the query column.

Here's an example create statement and search query:

CREATE TABLE t1
(
    id          INTEGER UNSIGNED NOT NULL,
    weight      INTEGER NOT NULL,
    query       VARCHAR(3072) NOT NULL,
    group_id    INTEGER,
    INDEX(query)
) ENGINE=SPHINX CONNECTION="sphinx://localhost:9312/test";

SELECT * FROM t1 WHERE query='test it;mode=any';

In a search table, the first three columns must have the following types: INTEGER UNSIGNED or BIGINT for the 1st column (document ID), INTEGER or BIGINT for the 2nd column (match weight), and VARCHAR or TEXT for the 3rd column (your query). This mapping is fixed; you cannot omit any of these three required columns, move them around, or change their types. Additionally, the query column must be indexed, while all others should remain unindexed. Column names are ignored, so you can use any arbitrary names.

Additional columns must be either INTEGER, TIMESTAMP, BIGINT, VARCHAR, or FLOAT. They will be bound to attributes provided in the Manticore result set by name, so their names must match the attribute names specified in sphinx.conf. If there's no matching attribute name in the Manticore search results, the column will have NULL values.

Special "virtual" attribute names can also be bound to SphinxSE columns. Use _sph_ instead of @ for that purpose. For example, to obtain the values of @groupby, @count, or @distinct virtual attributes, use _sph_groupby, _sph_count, or _sph_distinct column names, respectively.

The CONNECTION string parameter is used to specify the Manticore host, port, and table. If no connection string is specified in CREATE TABLE, the table name * (i.e., search all tables) and localhost:9312 are assumed. The connection string syntax is as follows:

CONNECTION="sphinx://HOST:PORT/TABLENAME"

You can change the default connection string later:

mysql> ALTER TABLE t1 CONNECTION="sphinx://NEWHOST:NEWPORT/NEWTABLENAME";

You can also override these parameters on a per-query basis.

As shown in the example, both the query text and search options should be placed in the WHERE clause on the search query column (i.e., the 3rd column). Options are separated by semicolons and their names from values by an equality sign. Any number of options can be specified. The available options are:

... WHERE query='test;sort=attr_asc:group_id';
... WHERE query='test;sort=extended:@weight desc, group_id asc';
... WHERE query='test;index=test1;';
... WHERE query='test;index=test1,test2,test3;';
... WHERE query='test;weights=1,2,3;';
# only include groups 1, 5 and 19
... WHERE query='test;filter=group_id,1,5,19;';
# exclude groups 3 and 11
... WHERE query='test;!filter=group_id,3,11;';
# include groups from 3 to 7, inclusive
... WHERE query='test;range=group_id,3,7;';
# exclude groups from 5 to 25
... WHERE query='test;!range=group_id,5,25;';
# filter by a float size
... WHERE query='test;floatrange=size,2,3;';
# pick all results within 1000 meter from geoanchor
... WHERE query='test;floatrange=@geodist,0,1000;';
... WHERE query='test;maxmatches=2000;';
... WHERE query='test;cutoff=10000;';
... WHERE query='test;maxquerytime=1000;';
... WHERE query='test;groupby=day:published_ts;';
... WHERE query='test;groupby=attr:group_id;';
... WHERE query='test;groupsort=@count desc;';
... WHERE query='test;groupby=attr:country_id;distinct=site_id';
... WHERE query='test;indexweights=tbl_exact,2,tbl_stemmed,1;';
... WHERE query='test;fieldweights=title,10,abstract,3,content,1;';
... WHERE query='test;comment=marker001;';
... WHERE query='test;select=2*a+3*** as myexpr;';
... WHERE query='test;host=sphinx-test.loc;port=7312;';
... WHERE query='test;mode=extended;ranker=bm25;';
... WHERE query='test;mode=extended;ranker=expr:sum(lcs);';

The "export" ranker functions similarly to ranker=expr, but it retains the per-document factor values, while ranker=expr discards them after computing the final WEIGHT() value. Keep in mind that ranker=export is intended for occasional use, such as training a machine learning (ML) function or manually defining your own ranking function, and should not be used in actual production. When utilizing this ranker, you'll likely want to examine the output of the RANKFACTORS() function, which generates a string containing all the field-level factors for each document.

Req
SELECT *, WEIGHT(), RANKFACTORS()
    FROM myindex
    WHERE MATCH('dog')
    OPTION ranker=export('100*bm25');
*************************** 1\. row ***************************
           id: 555617
    published: 1110067331
   channel_id: 1059819
        title: 7
      content: 428
     weight(): 69900
rankfactors(): bm25=699, bm25a=0.666478, field_mask=2,
doc_word_count=1, field1=(lcs=1, hit_count=4, word_count=1,
tf_idf=1.038127, min_idf=0.259532, max_idf=0.259532, sum_idf=0.259532,
min_hit_pos=120, min_best_span_pos=120, exact_hit=0,
max_window_hits=1), word1=(tf=4, idf=0.259532)

*************************** 2\. row ***************************
           id: 555313
    published: 1108438365
   channel_id: 1058561
        title: 8
      content: 249
     weight(): 68500
rankfactors(): bm25=685, bm25a=0.675213, field_mask=3,
doc_word_count=1, field0=(lcs=1, hit_count=1, word_count=1,
tf_idf=0.259532, min_idf=0.259532, max_idf=0.259532, sum_idf=0.259532,
min_hit_pos=8, min_best_span_pos=8, exact_hit=0, max_window_hits=1),
field1=(lcs=1, hit_count=2, word_count=1, tf_idf=0.519063,
min_idf=0.259532, max_idf=0.259532, sum_idf=0.259532, min_hit_pos=36,
min_best_span_pos=36, exact_hit=0, max_window_hits=1), word1=(tf=3,
idf=0.259532)
... WHERE query='test;geoanchor=latattr,lonattr,0.123,0.456';

One very important note is that it is much more efficient to let Manticore handle sorting, filtering, and slicing the result set, rather than increasing the max matches count and using WHERE, ORDER BY, and LIMIT clauses on the MySQL side. This is due to two reasons. First, Manticore employs a variety of optimizations and performs these tasks better than MySQL. Second, less data would need to be packed by searchd, transferred, and unpacked by SphinxSE.

You can obtain additional information related to the query results using the SHOW ENGINE SPHINX STATUS statement:

Req
mysql> SHOW ENGINE SPHINX STATUS;
+--------+-------+-------------------------------------------------+
| Type   | Name  | Status                                          |
+--------+-------+-------------------------------------------------+
| SPHINX | stats | total: 25, total found: 25, time: 126, words: 2 |
| SPHINX | words | sphinx:591:1256 soft:11076:15945                |
+--------+-------+-------------------------------------------------+
2 rows in set (0.00 sec)

You can also access this information through status variables. Keep in mind that using this method does not require super-user privileges.

Req
mysql> SHOW STATUS LIKE 'sphinx_%';
+--------------------+----------------------------------+
| Variable_name      | Value                            |
+--------------------+----------------------------------+
| sphinx_total       | 25                               |
| sphinx_total_found | 25                               |
| sphinx_time        | 126                              |
| sphinx_word_count  | 2                                |
| sphinx_words       | sphinx:591:1256 soft:11076:15945 |
+--------------------+----------------------------------+
5 rows in set (0.00 sec)

SphinxSE search tables can be joined with tables using other engines. Here's an example using the "documents" table from example.sql:

Req
mysql> SELECT content, date_added FROM test.documents docs
-> JOIN t1 ON (docs.id=t1.id)
-> WHERE query="one document;mode=any";

mysql> SHOW ENGINE SPHINX STATUS;
+-------------------------------------+---------------------+
| content                             | docdate             |
+-------------------------------------+---------------------+
| this is my test document number two | 2006-06-17 14:04:28 |
| this is my test document number one | 2006-06-17 14:04:28 |
+-------------------------------------+---------------------+
2 rows in set (0.00 sec)

+--------+-------+---------------------------------------------+
| Type   | Name  | Status                                      |
+--------+-------+---------------------------------------------+
| SPHINX | stats | total: 2, total found: 2, time: 0, words: 2 |
| SPHINX | words | one:1:2 document:2:2                        |
+--------+-------+---------------------------------------------+
2 rows in set (0.00 sec)

Building snippets via MySQL

SphinxSE also features a UDF function that allows you to create snippets using MySQL. This functionality is similar to HIGHLIGHT(), but can be accessed through MySQL+SphinxSE.

The binary providing the UDF is called sphinx.so and should be automatically built and installed in the appropriate location along with SphinxSE. If it doesn't install automatically for some reason, locate sphinx.so in the build directory and copy it to your MySQL instance's plugins directory. Once done, register the UDF with the following statement:

CREATE FUNCTION sphinx_snippets RETURNS STRING SONAME 'sphinx.so';

The function name must be sphinx_snippets; you cannot use an arbitrary name. The function arguments are as follows:

Prototype: function sphinx_snippets ( document, table, words [, options] );

The document and words arguments can be either strings or table columns. Options must be specified like this: 'value' AS option_name. For a list of supported options, refer to the Highlighting section. The only UDF-specific additional option is called sphinx and allows you to specify the searchd location (host and port).

Usage examples:

SELECT sphinx_snippets('hello world doc', 'main', 'world',
    'sphinx://192.168.1.1/' AS sphinx, true AS exact_phrase,
    '[**]' AS before_match, '[/**]' AS after_match)
FROM documents;

SELECT title, sphinx_snippets(text, 'index', 'mysql php') AS text
    FROM sphinx, documents
    WHERE query='mysql php' AND sphinx.id=documents.id;
¶ 22.2

FEDERATED

With the MySQL FEDERATED engine, you can connect to a local or remote Manticore instance from MySQL/MariaDB and perform search queries.

Using FEDERATED

An actual Manticore query can't be used directly with the FEDERATED engine and must be "proxied" (sent as a string in a column) due to the FEDERATED engine's limitations and the fact that Manticore implements custom syntax like the MATCH clause.

To search via FEDERATED, you first need to create a FEDERATED engine table. The Manticore query will be included in a query column in the SELECT performed over the FEDERATED table.

Creating a FEDERATED-compatible MySQL table:

SQL
CREATE TABLE t1
(
    id          INTEGER UNSIGNED NOT NULL,
    year        INTEGER NOT NULL,
    rating      FLOAT,
    query       VARCHAR(1024) NOT NULL,
    INDEX(query)
) ENGINE=FEDERATED
DEFAULT CHARSET=utf8
CONNECTION='mysql://FEDERATED@127.0.0.1:9306/DB/movies';
Query OK, 0 rows affected (0.00 sec)

Query FEDERATED compatible table:

SQL
SELECT * FROM t1 WHERE query='SELECT * FROM movies WHERE MATCH (\'pie\')';
+----+------+--------+------------------------------------------+
| id | year | rating | query                                    |
+----+------+--------+------------------------------------------+
|  1 | 2019 |      5 | SELECT * FROM movies WHERE MATCH ('pie') |
+----+------+--------+------------------------------------------+
1 row in set (0.04 sec)

The only fixed mapping is the query column. It is mandatory and must be the only column with a table attached.

The Manticore table linked via FEDERATED must be a physical table (plain or real-time).

The FEDERATED table should have columns with the same names as the remote Manticore table attributes since they will be bound to the attributes provided in the Manticore result set by name. However, it might map only some attributes, not all of them.

Manticore server identifies a query from a FEDERATED client by the user name "FEDERATED". The CONNECTION string parameter is used to specify the Manticore host, SQL port, and tables for queries coming through the connection. The connection string syntax is as follows:

CONNECTION="mysql://FEDERATED@HOST:PORT/DB/TABLENAME"

Since Manticore doesn't have the concept of a database, the DB string can be random as it will be ignored by Manticore, but MySQL requires a value in the CONNECTION string definition. As seen in the example, the full SELECT SQL query should be placed in a WHERE clause against the query column.

Only the SELECT statement is supported, not INSERT, REPLACE, UPDATE, or DELETE.

FEDERATED tips

One very important note is that it is much more efficient to allow Manticore to perform sorting, filtering, and slicing the result set than to increase the max matches count and use WHERE, ORDER BY, and LIMIT clauses on the MySQL side. This is for two reasons. First, Manticore implements a number of optimizations and performs better than MySQL for these tasks. Second, less data needs to be packed by searchd, transferred, and unpacked between Manticore and MySQL.

JOINs can be performed between a FEDERATED table and other MySQL tables. This can be used to retrieve information that is not stored in a Manticore table.

SQL
SELECT t1.id, t1.year, comments.comment FROM t1 JOIN comments ON t1.id=comments.post_id WHERE query='SELECT * FROM movies WHERE MATCH (\'pie\')';
+----+------+--------------+
| id | year | comment      |
+----+------+--------------+
|  1 | 2019 | was not good |
+----+------+--------------+
1 row in set (0.00 sec)
¶ 22.3

UDFs and Plugins

Manticore can be extended with user-defined functions, or UDFs for short, like this:

SELECT id, attr1, myudf (attr2, attr3+attr4) ...

You can dynamically load and unload UDFs into searchd without having to restart the server, and use them in expressions when searching, ranking, etc. A quick summary of the UDF features is as follows:

We do not yet support aggregation functions. In other words, your UDFs will be called for just a single document at a time and are expected to return some value for that document. Writing a function that can compute an aggregate value like AVG() over the entire group of documents that share the same GROUP BY key is not yet possible. However, you can use UDFs within the built-in aggregate functions: that is, even though MYCUSTOMAVG() is not supported yet, AVG(MYCUSTOMFUNC()) should work just fine!

UDFs offer a wide range of applications, such as:

Plugins

Plugins offer additional opportunities to expand search functionality. They can currently be used to compute custom rankings and tokenize documents and queries.

Here's the complete list of plugin types:

This section covers the general process of writing and managing plugins; specifics related to creating different types of plugins are discussed in their respective subsections.

So, how do you write and use a plugin? Here's a quick four-step guide:

Note that while UDFs are first-class plugins, they are installed using a separate CREATE FUNCTION statement. This allows for a neat specification of the return type, without sacrificing backward compatibility or changing the syntax.

Dynamic plugins are supported in threads and thread_pool workers. Multiple plugins (and/or UDFs) can be contained in a single library file. You may choose to either group all project-specific plugins in one large library or create a separate library for each UDF and plugin; it's up to you.

As with UDFs, you should include the src/sphinxudf.h header file. At the very least, you'll need the SPH_UDF_VERSION constant to implement an appropriate version function. Depending on the specific plugin type, you may or may not need to link your plugin with src/sphinxudf.c. However, all functions implemented in sphinxudf.c are related to unpacking the PACKEDFACTORS() blob, and no plugin types have access to that data. So currently, linking with just the header should suffice. (In fact, if you copy over the UDF version number, you won't even need the header file for some plugin types.)

Formally, plugins are simply sets of C functions that adhere to a specific naming pattern. You're typically required to define one key function for the primary task, but you can also define additional functions. For instance, to implement a ranker called "myrank", you must define a myrank_finalize() function that returns the rank value. However, you can also define myrank_init(), myrank_update(), and myrank_deinit() functions. Specific sets of well-known suffixes and call arguments differ based on the plugin type, but _init() and _deinit() are generic, and every plugin has them. Hint: for a quick reference on known suffixes and their argument types, refer to sphinxplugin.h, where the call prototypes are defined at the beginning of the file.

Even though the public interface is defined in pure C, our plugins essentially follow an object-oriented model. Indeed, every _init() function receives a void ** userdata out-parameter, and the pointer value stored at (*userdata) is then passed as the first argument to all other plugin functions. So you can think of a plugin as a class that gets instantiated every time an object of that class is needed to handle a request: the userdata pointer serves as the this pointer; the functions act as methods, and the _init() and _deinit() functions work as constructor and destructor, respectively.

This minor OOP-in-C complication arises because plugins run in a multi-threaded environment, and some need to maintain state. You can't store that state in a global variable in your plugin, so we pass around a userdata parameter, which naturally leads to the OOP model. If your plugin is simple and stateless, the interface allows you to omit _init(), _deinit(), and any other functions.

To summarize, here's the simplest complete ranker plugin in just three lines of C code:

// gcc -fPIC -shared -o myrank.so myrank.c
#include "sphinxudf.h"
int myrank_ver() { return SPH_UDF_VERSION; }
int myrank_finalize(void *u, int w) { return 123; }

Here's how to use the simple ranker plugin:

mysql> CREATE PLUGIN myrank TYPE 'ranker' SONAME 'myrank.dll';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT id, weight() FROM test1 WHERE MATCH('test') OPTION ranker=myrank('');
+------+----------+
| id   | weight() |
+------+----------+
|    1 |      123 |
|    2 |      123 |
+------+----------+
2 rows in set (0.01 sec)
¶ 22.3.1

Listing plugins

SHOW PLUGINS

SHOW PLUGINS

Displays all the loaded plugins (except for Buddy plugins, see below) and UDFs. The "Type" column should be one of the udf, ranker, index_token_filter, or query_token_filter. The "Users" column is the number of thread that are currently using that plugin in a query. The "Extra" column is intended for various additional plugin-type specific information; currently, it shows the return type for the UDFs and is empty for all the other plugin types.

Example
SHOW PLUGINS;
+------+----------+----------------+-------+-------+
| Type | Name     | Library        | Users | Extra |
+------+----------+----------------+-------+-------+
| udf  | sequence | udfexample.dll | 0     | INT   |
+------+----------+----------------+-------+-------+
1 row in set (0.00 sec)

SHOW BUDDY PLUGINS

SHOW BUDDY PLUGINS

This will display all available plugins, including core and local ones.
To remove a plugin, make sure to use the name listed in the Package column.

Example
SHOW BUDDY PLUGINS;
+------------------------------------------------+-----------------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Package                                        | Plugin          | Version | Type | Info                                                                                                                                                     |
+------------------------------------------------+-----------------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| manticoresoftware/buddy-plugin-empty-string    | empty-string    | 2.1.5   | core | Handles empty queries, which can occur when trimming comments or dealing with specific SQL protocol instructions in comments that are not supported      |
| manticoresoftware/buddy-plugin-backup          | backup          | 2.1.5   | core | BACKUP sql statement                                                                                                                                     |
| manticoresoftware/buddy-plugin-emulate-elastic | emulate-elastic | 2.1.5   | core | Emulates some Elastic queries and generates responses as if they were made by ES                                                                         |
| manticoresoftware/buddy-plugin-insert          | insert          | 2.1.5   | core | Auto schema support. When an insert operation is performed and the table does not exist, it creates it with data types auto-detection                    |
| manticoresoftware/buddy-plugin-alias           | alias           | 2.1.5   | core |                                                                                                                                                          |
| manticoresoftware/buddy-plugin-select          | select          | 2.1.5   | core | Various SELECTs handlers needed for mysqldump and other software support, mostly aiming to work similarly to MySQL                                       |
| manticoresoftware/buddy-plugin-show            | show            | 2.1.5   | core | Various "show" queries handlers, for example, `show queries`, `show fields`, `show full tables`, etc                                                     |
| manticoresoftware/buddy-plugin-cli-table       | cli-table       | 2.1.5   | core | /cli endpoint based on /cli_json - outputs query result as a table                                                                                       |
| manticoresoftware/buddy-plugin-plugin          | plugin          | 2.1.5   | core | Core logic for plugin support and helpers. Also handles `create buddy plugin`, `delete buddy plugin`, and `show buddy plugins`                           |
| manticoresoftware/buddy-plugin-test            | test            | 2.1.5   | core | Test plugin, used exclusively for tests                                                                                                                  |
| manticoresoftware/buddy-plugin-insert-mva      | insert-mva      | 2.1.5   | core | Manages the restoration of MVA fields with mysqldump                                                                                                     |
| manticoresoftware/buddy-plugin-modify-table    | modify-table    | 2.1.5   | core | Assists in standardizing options in create and alter table statements to show option=1 for integers. Also manages the logic for creating sharded tables. |
| manticoresoftware/buddy-plugin-knn             | knn             | 2.1.5   | core | Enables KNN by document id                                                                                                                               |
| manticoresoftware/buddy-plugin-replace         | replace         | 2.1.5   | core | Enables partial replaces                                                                                                                                 |
+------------------------------------------------+-----------------+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----+
¶ 22.3.2

UDF

UDFs are stored in external dynamic libraries (.so files on UNIX and .dll on Windows systems). Library files must be placed in a trusted folder specified by the plugin_dir directive for security reasons: it's easier to secure a single folder than to allow anyone to install arbitrary code into searchd. You can dynamically load and unload UDFs into searchd using CREATE FUNCTION and DROP FUNCTION SQL statements, respectively. Additionally, you can seamlessly reload UDFs (and other plugins) with the RELOAD PLUGINS statement. Manticore keeps track of currently loaded functions; every time you create or drop a UDF, searchd updates its state in the sphinxql_state file as a plain SQL script.

UDFs are local. To use them on a cluster, you must place the same library on all nodes and run CREATE statements on each node as well. This process may change in future versions.

Once you successfully load a UDF, you can use it in your SELECT or other statements just like any built-in function:

SELECT id, MYCUSTOMFUNC (groupid, authorname), ... FROM myindex

Multiple UDFs (and other plugins) can reside in a single library. The library will only be loaded once and is automatically unloaded once all the UDFs and plugins within it are dropped.

In theory, you can write a UDF in any language, as long as its compiler can import standard C headers and emit standard dynamic libraries with properly exported functions. However, writing in C++ or plain C is the path of least resistance. We provide an example UDF library written in plain C that implements several functions (demonstrating various techniques) alongside our source code, found at src/udfexample.c. This example includes the src/sphinxudf.h header file, which contains definitions of several UDF-related structures and types. For most UDFs and plugins, simply using #include "sphinxudf.h" as shown in the example should be sufficient. However, if you're writing a ranking function and need to access ranking signals (factors) data from within the UDF, you'll also need to compile and link with src/sphinxudf.c (available in our source code), as the implementations of functions that let you access signal data from within the UDF reside in that file.

Both the sphinxudf.h header and sphinxudf.c are standalone, so you can copy those files individually; they don't depend on any other parts of Manticore's source code.

Within your UDF, you must implement and export only a couple of functions. First, for UDF interface version control, you must define a function int LIBRARYNAME_ver(), where LIBRARYNAME is the name of your library file, and you must return SPH_UDF_VERSION (a value defined in sphinxudf.h) from it. Here's an example.

#include <sphinxudf.h>

// our library will be called udfexample.so, thus, so it must define
// a version function named udfexample_ver()
int udfexample_ver()
{
    return SPH_UDF_VERSION;
}

This precaution protects you from accidentally loading a library with a mismatching UDF interface version into a newer or older searchd. Secondly, you must implement the actual function as well.

sphinx_int64_t testfunc ( SPH_UDF_INIT * init, SPH_UDF_ARGS * args, char * error_flag )
{
    return 123;
}

UDF function names in SQL are case-insensitive. However, the respective C function names are not; they need to be all lower-case, or the UDF will not load. More importantly, it is crucial that:

  1. the calling convention is C (aka __cdecl),
  2. the arguments list matches the plugin system expectations exactly, and
  3. the return type matches the one you specify in CREATE FUNCTION.

Unfortunately, there is no (easy) way for us to check for these mistakes when loading the function, and they could crash the server and/or result in unexpected results. Last but not least, all the C functions you implement need to be thread-safe.

The first argument, a pointer to SPH_UDF_INIT structure, is essentially a pointer to our function state. It is optional. In the example just above, the function is stateless, as it simply returns 123 every time it gets called. So, we do not have to define an initialization function, and we can simply ignore that argument.
This argument serves one more purpose. Since a single query can be executed on multiple threads (see pseudo-sharding), the daemon tries to determine whether a UDF is stateful or stateless by checking this argument. If the argument is initialized, parallel execution will be disabled. So, if your UDF is stateful but you don't use this argument, it will be called from multiple threads, and your code needs to be aware of that.

The second argument, a pointer to SPH_UDF_ARGS, is the most important one. All the actual call arguments are passed to your UDF via this structure; it contains the call argument count, names, types, etc. So, whether your function gets called like SELECT id, testfunc(1) or like SELECT id, testfunc('abc', 1000*id+gid, WEIGHT()) or any other way, it will receive the very same SPH_UDF_ARGS structure in all of these cases. However, the data passed in the args structure will be different. In the first example, args->arg_count will be set to 1, in the second example it will be set to 3, and the args->arg_types array will contain different type data, and so on.

Finally, the third argument is an error flag. A UDF can raise it to indicate that some kind of internal error occurred, the UDF cannot continue, and the query should terminate early. You should not use this for argument type checks or for any other error reporting that is likely to happen during normal use. This flag is designed to report sudden critical runtime errors, such as running out of memory.

If we wanted to, say, allocate temporary storage for our function to use, or check upfront whether the arguments are of the supported types, then we would need to add two more functions, for UDF initialization and deinitialization, respectively.

int testfunc_init ( SPH_UDF_INIT * init, SPH_UDF_ARGS * args,
    char * error_message )
{
    // allocate and initialize a little bit of temporary storage
    init->func_data = malloc ( sizeof(int) );

    *(int*)init->func_data = 123;

    // return a success code
    return 0;
}

void testfunc_deinit ( SPH_UDF_INIT * init )
{
    // free up our temporary storage
    free ( init->func_data );
}

Note how testfunc_init() also receives the call arguments structure. By the time it is called, it does not receive any actual values, so the args->arg_values will be NULL. But the argument names and types are known and will be passed. You can check them in the initialization function and return an error if they are of an unsupported type.

SPH_UDF_ARGS types

UDFs can receive arguments of pretty much any valid internal Manticore type. Refer to the sphinx_udf_argtype enumeration in sphinxudf.h for a full list. Most of the types map straightforwardly to the respective C types.

The most notable type is the SPH_UDF_TYPE_FACTORS argument type. You get that type by calling your UDF with a PACKEDFACTOR() argument. Its data is a binary blob in a certain internal format, and to extract individual ranking signals from that blob, you need to use either of the two sphinx_factors_XXX() or sphinx_get_YYY_factor() families of functions.

sphinx_factors_XXX() functions

This family consists of 3 functions.

First, you need to call init() and unpack(), then you can use the SPH_UDF_FACTORS fields, and finally, you need to clean up with deinit().

This approach is simple but may result in a bunch of memory allocations for each processed document, which could be slow.

sphinx_get_YYY_factor() functions

The other interface, consisting of a bunch of sphinx_get_YYY_factor() functions, is a bit more verbose to use but accesses the blob data directly and guarantees no allocations. For top-notch ranking UDF performance, you'll want to use this approach.

Return types of UDF

As for the return types, UDFs can currently return a single INTEGER, BIGINT, FLOAT, or STRING value. The C function return type should be sphinx_int64_t, sphinx_int64_t, double, or char* respectively. In the last case, you must use the args->fn_malloc function to allocate space for returned string values. Internally in your UDF, you can use whatever you want, so the testfunc_init() example above is correct code even though it uses malloc() directly: you manage that pointer yourself, it gets freed up using a matching free() call, and all is well. However, the returned strings values are managed by Manticore, and we have our own allocator, so for the return values specifically, you need to use it too.

Depending on how your UDFs are used in the query, the main function call (testfunc() in our example) might be called in a rather different volume and order. Specifically,

The calling sequence of the other functions is fixed, though. Namely,

¶ 22.3.2.1

CREATE FUNCTION

CREATE FUNCTION udf_name
    RETURNS {INT | INTEGER | BIGINT | FLOAT | STRING}
    SONAME 'udf_lib_file'

CREATE FUNCTION statement installs a user-defined function UDF with the specified name and type from the provided library file. The library file must be located in a trusted plugin_dir directory. Upon successful installation, the function becomes available for use in all subsequent queries received by the server. Example:

mysql> CREATE FUNCTION avgmva RETURNS INTEGER SONAME 'udfexample.dll';
Query OK, 0 rows affected (0.03 sec)

mysql> SELECT *, AVGMVA(tag) AS q from test1;
+------+--------+---------+-----------+
| id   | weight | tag     | q         |
+------+--------+---------+-----------+
|    1 |      1 | 1,3,5,7 | 4.000000  |
|    2 |      1 | 2,4,6   | 4.000000  |
|    3 |      1 | 15      | 15.000000 |
|    4 |      1 | 7,40    | 23.500000 |
+------+--------+---------+-----------+
¶ 22.3.2.2

DROP FUNCTION

DROP FUNCTION udf_name

DROP FUNCTION statement uninstalls a user-defined function UDF with the specified name. Upon successful removal, the function will no longer be available for use in subsequent queries. However, ongoing concurrent queries will not be affected, and if necessary, the library unloading will be delayed until those queries are completed. Example:

mysql> DROP FUNCTION avgmva;
Query OK, 0 rows affected (0.00 sec)
¶ 22.3.3

Plugins

¶ 22.3.3.1

System plugins

CREATE PLUGIN

CREATE PLUGIN plugin_name TYPE 'plugin_type' SONAME 'plugin_library'

Loads the given library (if it is not already loaded) and loads the specified plugin from it. The available plugin types include:

For more information on writing plugins, please refer to the plugins documentation.

mysql> CREATE PLUGIN myranker TYPE 'ranker' SONAME 'myplugins.so';
Query OK, 0 rows affected (0.00 sec)

CREATE BUDDY PLUGIN

Buddy plugins can extend Manticore Search's functionality and enable certain queries that are not natively supported. To learn more about creating Buddy plugins, we recommend reading this article.

To create a Buddy plugin, run the following SQL command:

CREATE PLUGIN <username/package name on https://packagist.org/> TYPE 'buddy' VERSION <package version>

You can also use an alias command specifically created for Buddy plugins, which is easier to remember:

CREATE BUDDY PLUGIN <username/package name on https://packagist.org/> VERSION <package version>

This command will install the show-hostname plugin to the plugin_dir and enable it without the need to restart the server.

Examples

Example
CREATE PLUGIN manticoresoftware/buddy-plugin-show-hostname TYPE 'buddy' VERSION 'dev-main';

CREATE BUDDY PLUGIN manticoresoftware/buddy-plugin-show-hostname VERSION 'dev-main';
¶ 22.3.3.2

DELETE PLUGIN

DROP PLUGIN plugin_name TYPE 'plugin_type'

Marks the designated plugin for unloading. The unloading process is not instantaneous, as concurrent queries may still be utilizing it. Nevertheless, following a DROP, new queries will no longer have access to the plugin. Subsequently, when all ongoing queries involving the plugin have finished, the plugin will be unloaded. If all plugins from the specified library are unloaded, the library will also be automatically unloaded.

mysql> DROP PLUGIN myranker TYPE 'ranker';
Query OK, 0 rows affected (0.00 sec)

DELETE BUDDY PLUGIN

DELETE BUDDY PLUGIN <username/package name on https://packagist.org/>

This action instantly and permanently removes the installed plugin from the plugin_dir. Once removed, the plugin's features will no longer be available.

Example

Example
DELETE BUDDY PLUGIN manticoresoftware/buddy-plugin-show-hostname
¶ 22.3.3.3

Enabling and disabling Buddy plugins

To simplify the control of Buddy plugins, especially when developing a new one or modifying an existing one, the enable and disable Buddy plugin commands are provided. These commands act temporarily during runtime and will reset to their defaults after restarting the daemon or performing a Buddy reset. To permanently disable a plugin, it must be removed.

You need the fully qualified package name of the plugin to enable or disable it. To find it, you can run the SHOW BUDDY PLUGINS query and look for the full qualified name in the package field. For example, the SHOW plugin has the fully qualified name manticoresoftware/buddy-plugin-show.

ENABLE BUDDY PLUGIN

ENABLE BUDDY PLUGIN <username/package name on https://packagist.org/>

This command reactivates a previously disabled Buddy plugin, allowing it to process your requests again.

Example

SQL
ENABLE BUDDY PLUGIN manticoresoftware/buddy-plugin-show

DISABLE BUDDY PLUGIN

DISABLE BUDDY PLUGIN <username/package name on https://packagist.org/>

This command deactivates an active Buddy plugin, preventing it from processing any further requests.

Example

SQL
DISABLE BUDDY PLUGIN manticoresoftware/buddy-plugin-show
After disabling, if you try the `SHOW QUERIES` command, you'll encounter an error because the plugin is disabled.
¶ 22.3.3.4

RELOADING PLUGINS

RELOAD PLUGINS FROM SONAME 'plugin_library'

Reloads all plugins (UDFs, rankers, etc.) from a given library. In a sense, the reload process is transactional, ensuring that:
1. all plugins are successfully updated to their new versions;
2. the update is atomic, meaning all plugins are replaced simultaneously. This atomicity ensures that queries using multiple functions from a reloaded library will never mix old and new versions.

During the RELOAD, the set of plugins is guaranteed to be consistent; they will either be all old or all new.

The reload process is also seamless, as some version of a reloaded plugin will always be available for concurrent queries, without any temporary disruptions. This is an improvement over using a pair of DROP and CREATE statements for reloading. With those, there is a brief window between the DROP and the subsequent CREATE during which queries technically refer to an unknown plugin and will therefore fail.

If there's any failure, RELOAD PLUGINS does nothing, retains the old plugins, and reports an error.

On Windows, overwriting or deleting a DLL library currently in use can be problematic. However, you can still rename it, place a new version under the old name, and then RELOAD will work. After a successful reload, you'll also be able to delete the renamed old library.

mysql> RELOAD PLUGINS FROM SONAME 'udfexample.dll';
Query OK, 0 rows affected (0.00 sec)
¶ 22.3.3.5

Ranker plugins

Ranker plugins let you implement a custom ranker that receives all the occurrences of the keywords matched in the document, and computes a WEIGHT() value. They can be called as follows:

SELECT id, attr1 FROM test WHERE match('hello') OPTION ranker=myranker('option1=1');

The call workflow proceeds as follows:

  1. XXX_init() is invoked once per query per table, at the very beginning. Several query-wide options are passed to it via a SPH_RANKER_INIT structure, including the user options strings (for instance, "option1=1" in the example above).
  2. XXX_update() is called multiple times for each matched document, with every matched keyword occurrence provided as its parameter, a SPH_RANKER_HIT structure. The occurrences within each document are guaranteed to be passed in ascending order of hit->hit_pos values.
  3. XXX_finalize() is called once for each matched document when there are no more keyword occurrences. It must return the WEIGHT() value. This function is the only mandatory one.
  4. XXX_deinit() is invoked once per query, at the very end.
¶ 22.3.3.6

Token filter plugins

Token filter plugins allow you to implement a custom tokenizer that creates tokens according to custom rules. There are two types:

In the text processing pipeline, token filters will run after the base tokenizer processing occurs (which processes the text from fields or queries and creates tokens out of them).

Index-time tokenizer

Index-time tokenizer is created by indexer when indexing source data into a table or by an RT table when processing INSERT or REPLACE statements.

Plugin is declared as library name:plugin name:optional string of settings. The init functions of the plugin can accept arbitrary settings that can be passed as a string in the format option1=value1;option2=value2;...

Example:

index_token_filter = my_lib.so:email_process:field=email;split=.io

The call workflow for index-time token filter is as follows:

  1. XXX_init() gets called right after indexer creates token filter with an empty fields list and then after indexer gets the table schema with the actual fields list. It must return zero for successful initialization or an error description otherwise.
  2. XXX_begin_document gets called only for RT table INSERT/REPLACE for every document. It must return zero for a successful call or an error description otherwise. Using OPTION token_filter_options, additional parameters/settings can be passed to the function.
    sql INSERT INTO rt (id, title) VALUES (1, 'some text corp@space.io') OPTION token_filter_options='.io'
  3. XXX_begin_field gets called once for each field prior to processing the field with the base tokenizer, with the field number as its parameter.
  4. XXX_push_token gets called once for each new token produced by the base tokenizer, with the source token as its parameter. It must return the token, count of extra tokens made by the token filter, and delta position for the token.
  5. XXX_get_extra_token gets called multiple times in case XXX_push_token reports extra tokens. It must return the token and delta position for that extra token.
  6. XXX_end_field gets called once right after the source tokens from the current field are processed.
  7. XXX_deinit gets called at the very end of indexing.

The following functions are mandatory to be defined: XXX_begin_document, XXX_push_token, and XXX_get_extra_token.

query-time token filter

Query-time tokenizer gets created on search each time full-text is invoked by every table involved.

The call workflow for query-time token filter is as follows:

  1. XXX_init() gets called once per table prior to parsing the query with parameters - max token length and a string set by the token_filter option
    sql SELECT * FROM index WHERE MATCH ('test') OPTION token_filter='my_lib.so:query_email_process:io'
    It must return zero for successful initialization or error description otherwise.
  2. XXX_push_token() gets called once for each new token produced by the base tokenizer with parameters: token produced by the base tokenizer, pointer to raw token at source query string, and raw token length. It must return the token and delta position for the token.
  3. XXX_pre_morph() gets called once for the token right before it gets passed to the morphology processor with a reference to the token and stopword flag. It might set the stopword flag to mark the token as a stopword.
  4. XXX_post_morph() gets called once for the token after it is processed by the morphology processor with a reference to the token and stopword flag. It might set the stopword flag to mark the token as a stopword. It must return a flag, the non-zero value of which means to use the token prior to morphology processing.
  5. XXX_deinit() gets called at the very end of query processing.

Absence of the functions is tolerated.

¶ 23

Miscellaneous tools

indextool

indextool is a helpful utility that extracts various information about a physical table, excluding template or distributed tables. Here's the general syntax for utilizing indextool:

indextool <command> [options]

Options

These options are applicable to all commands:

Commands

Here are the available commands:

spelldump

The spelldump command is designed to retrieve the contents from a dictionary file that employs the ispell or MySpell format. This can be handy when you need to compile word lists for wordforms, as it generates all possible forms for you.

Here's the general syntax:

spelldump [options] <dictionary> <affix> [result] [locale-name]

The primary parameters are the main file and the affix file of the dictionary. Typically, these are named as [language-prefix].dict and [language-prefix].aff, respectively. You can find these files in most standard Linux distributions or from numerous online sources.

The [result] parameter is where the extracted dictionary data will be stored, and [locale-name] is the parameter used to specify the locale details of your choice.

There's an optional -c [file] option as well. This option allows you to specify a file for case conversion details.

Here are some usage examples:

spelldump en.dict en.aff
spelldump ru.dict ru.aff ru.txt ru_RU.CP1251
spelldump ru.dict ru.aff ru.txt .1251

The resulting file will list all the words from the dictionary, arranged alphabetically and formatted like a wordforms file. You can then modify this file as per your specific requirements. Here's a sample of what the output file might look like:

zone > zone
zoned > zoned
zoning > zoning

wordbreaker

The wordbreaker tool is designed to deconstruct compound words, a common feature in URLs, into their individual components. For instance, it can dissect "lordoftherings" into four separate words or break down http://manofsteel.warnerbros.com into "man of steel warner bros". This ability enhances search functionality by eliminating the need for prefixes or infixes. To illustrate, a search for "sphinx" wouldn't yield "sphinxsearch" in the results. However, if you apply wordbreaker to disassemble the compound word and index the detached elements, a search will be successful without the file size expansion associated with prefix or infix usage in full-text indexing.

Here are some examples of how to use wordbreaker:

echo manofsteel | bin/wordbreaker -dict dict.txt split
man of steel

The -dict dictionary file is used to separate the input stream into individual words. If no dictionary file is specified, Wordbreaker will look for a file named wordbreaker-dict.txt in the current working directory. (Ensure that the dictionary file matches the language of the compound word you're working with.) The split command breaks words from the standard input and sends the results to the standard output. The test and bench commands are also available to assess the splitting quality and measure the performance of the splitting function, respectively.

Wordbreaker uses a dictionary to identify individual substrings within a given string. To distinguish between multiple potential splits, it considers the relative frequency of each word in the dictionary. A higher frequency indicates a higher likelihood for a word split. To generate a file of this nature, you can use the indexer tool:

indexer --buildstops dict.txt 100000 --buildfreqs myindex -c /path/to/manticore.conf

which will produce a text file named dict.txt that contains the 100,000 most frequently occurring words from myindex, along with their respective counts. Since this output file is a simple text document, you have the flexibility to manually edit it whenever needed. Feel free to add or remove words as required.

¶ 24

OpenAPI specification

The Manticore Search API is documented using the OpenAPI specification, which can be used to generate client SDKs. The machine-readable YAML file is available at https://raw.githubusercontent.com/manticoresoftware/openapi/master/manticore.yml

You can also view the specification visualized with the online Swagger Editor here.

¶ 25

Telemetry

At Manticore, we gather various anonymized metrics to enhance the quality of our products, including Manticore Search. By analyzing this data, we can not only improve the overall performance of our product but also identify which features would be most beneficial to prioritize in order to provide even more value to our users. The telemetry system operates on a separate thread in a non-blocking mode, taking snapshots and sending them once every few minutes.

We take your privacy seriously, and you can rest assured that all metrics are completely anonymous and no sensitive information is transmitted. However, if you still wish to disable telemetry, you have the option to do so by:

Here is a list of all the metrics we collect:

The ⏱️ symbol indicates that the metric is collected periodically, as opposed to other metrics which are collected based on specific events.

Metric Description
invocation Sent when Manticore Buddy is launched
plugin_* Indicates that the plugin with a given name was executed, plugin_backup for backup execution, for example
command_* ⏱️ All metrics with this prefix are sent from the show status query of the Manticore daemon
uptime ⏱️ The uptime of the Manticore Search daemon
workers_total ⏱️ The number of workers used by Manticore
cluster_count ⏱️ How many clusters this node handles
cluster_size ⏱️ How many nodes in all clusters
table_*_count ⏱️ The number of tables created for each type: plain, percolate, rt, or distributed
*_field_*_count ⏱️ The count for each field type for tables with rt and percolate types
columnar ⏱️ Indicates that the Columnar library was used
columnar_field_count ⏱️ The number of fields that use the Columnar library

Backup metrics

The Manticore backup tool sends anonymized metrics to the Manticore metrics server by default in order to help improve the product. If you don't want to send telemetry, you can disable it by running the tool with the --disable-metric flag or by setting the environment variable TELEMETRY=0.

The following is a list of all collected metrics:

Metric Description
invocation Sent when backup was initiated
failed Sent in case of failed backup
done Sent when backup/restore is successful
arg_* The arguments used to run the tool (excluding index names, etc.)
backup_store_versions_fails Indicates failure in saving Manticore version in the backup
backup_table_count Total number of backed up tables
backup_no_permissions Failed backup due to insufficient permissions to destination directory
backup_total_size Total size of the full backup
backup_time Duration of the backup
restore_searchd_running Failed to run restore process due to searchd already running
restore_no_config_file No config file found in the backup during restore
restore_time Duration of the restore
fsync_time Duration of fsync
restore_target_exists Occurs when a folder or index already exists in the destination folder for restore
terminations Indicates that the process was terminated
signal_* The signal used to terminate the process
tables Number of tables in Manticore
config_unreachable Specified configuration file does not exist
config_data_dir_missing Failed to parse data_dir from specified configuration file
config_data_dir_is_relative data_dir path in Manticore instance's configuration file is relative

Labels

Each metric comes with the following labels:

Label Description
collector buddy. Indicates that this metric is collected through Manticore Buddy
os_name Name of the operating system
os_release_name Name from the /etc/os-release if presents or unknown
os_release_version Version from the /etc/os-release if presents or unknown
dockerized If it's run inside the Docker environment
official_docker In case of Docker it's flag that shows we use official image
machine_id Server identifier (the content of /etc/machine-id in Linux)
arch Architecture of the machin we run on
manticore_version Version of Manticore
columnar_version Version of the Columnar library if it is installed
secondary_version Version of the secondary library if the Columnar library is installed
knn_version Version of the KNN library if the Columnar library is installed
buddy_version Version of Manticore Buddy
¶ 26

Changelog

Version 6.3.0 (preparing for release)

While 6.3.0 is being prepared for release, use the dev version which includes all the below changes - https://mnt.cr/dev/nightly

Major changes

Minor changes

Breaking changes and deprecations

Bug fixes

Version 6.2.12

Released: August 23rd 2023

Version 6.2.12 continues the 6.2 series and addresses issues discovered after the release of 6.2.0.

Bugfixes

Minor changes

MCL

Version 6.2.0

Released: August 4th 2023

Major changes

Minor changes

⚠️ Breaking changes

Bugfixes

Version 6.0.4

Released: March 15 2023

New features

Bugfixes

Version 6.0.2

Released: Feb 10 2023

Bugfixes

Version 6.0.0

Released: Feb 7 2023

Starting with this release, Manticore Search comes with Manticore Buddy, a sidecar daemon written in PHP that handles high-level functionality that does not require super low latency or high throughput. Manticore Buddy operates behind the scenes, and you may not even realize it is running. Although it is invisible to the end user, it was a significant challenge to make Manticore Buddy easily installable and compatible with the main C++-based daemon. This major change will allow the team to develop a wide range of new high-level features, such as shards orchestration, access control and authentication, and various integrations like mysqldump, DBeaver, Grafana mysql connector. For now it already handles SHOW QUERIES, BACKUP and Auto schema.

This release also includes more than 130 bug fixes and numerous features, many of which can be considered major.

Major Changes

Minor changes

We are not planning to make the old forms obsolete, but to ensure compatibility with the documentation, we recommend changing the names in your application. What will be changed in a future release is the "index" to "table" rename in the output of various SQL and JSON commands.

If you are running a replication cluster, you'll need to run ALTER TABLE <table name> REBUILD SECONDARY on all the nodes or follow this instruction with just change: run the ALTER .. REBUILD SECONDARY instead of the OPTIMIZE.

Bugfixes

Version 5.0.2

Released: May 30th 2022

Bugfixes

Version 5.0.0

Released: May 18th 2022

Major new features

Previously (note the response time): ```bash $ time curl -v -sX POST http://localhost:9318/bulk -H "Content-Type: application/x-ndjson" --data '{"insert": {"index": "user", "doc": {"name":"Prof. Matt Heaney IV","email":"ibergnaum@yahoo.com","description":"Tempora ullam eaque consequatur. Vero aut minima ut et ut omnis officiis vel. Molestiae quis voluptatum sint numquam.","age":15,"active":1}}} {"insert": {"index": "user", "doc": {"name":"Prof. Boyd McKenzie","email":"carlotta11@hotmail.com","description":"Blanditiis maiores odio corporis eaque illum. Aut et rerum iste. Neque et ullam quisquam officia dignissimos quo cumque.","age":84,"active":1}}} {"insert": {"index": "user", "doc": {"name":"Mr. Johann Smith","email":"stiedemann.tristin@ziemann.com","description":"Temporibus amet magnam consequatur omnis consequatur illo fugit. Debitis natus doloremque est tempore deserunt vero. Harum eos corrupti nemo ut.","age":89,"active":1}}} {"insert": {"index": "user", "doc": {"name":"Hector Pouros","email":"hickle.mafalda@hotmail.com","description":" as voluptatem inventore sit. Aliquam fugit perferendis est id aut odio et sapiente.","age":64,"active":1}}}' * Trying 127.0.0.1... * Connected to localhost (127.0.0.1) port 9318 (#0) > POST /bulk HTTP/1.1 > Host: localhost:9318 > User-Agent: curl/7.47.0 > Accept: */* > Content-Type: application/x-ndjson > Content-Length: 1025 > Expect: 100-continue > * Done waiting for 100-continue * We are completely uploaded and fine < HTTP/1.1 200 OK < Server: 4.2.0 15e927b@211223 release (columnar 1.11.4 327b3d4@211223) < Content-Type: application/json; charset=UTF-8 < Content-Length: 434 < * Connection #0 to host localhost left intact {"items":[{"insert":{"_index":"user","_id":2811798918248005633,"created":true,"result":"created","status":201}},{"insert":{"_index":"user","_id":2811798918248005634,"created":true,"result":"created","status":201}},{"insert":{"_index":"user","_id":2811798918248005635,"created":true,"result":"created","status":201}},{"insert":{"_index":"user","_id":2811798918248005636,"created":true,"result":"created","status":201}}],"errors":false} real 0m1.022s user 0m0.001s sys 0m0.010s ``` Now: ```bash $ time curl -v -sX POST http://localhost:9318/bulk -H "Content-Type: application/x-ndjson" --data '{"insert": {"index": "user", "doc": {"name":"Prof. Matt Heaney IV","email":"ibergnaum@yahoo.com","description":"Tempora ullam eaque consequatur. Vero aut minima ut et ut omnis officiis vel. Molestiae quis voluptatum sint numquam.","age":15,"active":1}}} {"insert": {"index": "user", "doc": {"name":"Prof. Boyd McKenzie","email":"carlotta11@hotmail.com","description":"Blanditiis maiores odio corporis eaque illum. Aut et rerum iste. Neque et ullam quisquam officia dignissimos quo cumque.","age":84,"active":1}}} {"insert": {"index": "user", "doc": {"name":"Mr. Johann Smith","email":"stiedemann.tristin@ziemann.com","description":"Temporibus amet magnam consequatur omnis consequatur illo fugit. Debitis natus doloremque est tempore deserunt vero. Harum eos corrupti nemo ut.","age":89,"active":1}}} {"insert": {"index": "user", "doc": {"name":"Hector Pouros","email":"hickle.mafalda@hotmail.com","description":" as voluptatem inventore sit. Aliquam fugit perferendis est id aut odio et sapiente.","age":64,"active":1}}}' * Trying 127.0.0.1... * Connected to localhost (127.0.0.1) port 9318 (#0) > POST /bulk HTTP/1.1 > Host: localhost:9318 > User-Agent: curl/7.47.0 > Accept: */* > Content-Type: application/x-ndjson > Content-Length: 1025 > Expect: 100-continue > < HTTP/1.1 100 Continue < Server: 4.2.1 63e5749@220405 dev < Content-Type: application/json; charset=UTF-8 < Content-Length: 0 * We are completely uploaded and fine < HTTP/1.1 200 OK < Server: 4.2.1 63e5749@220405 dev < Content-Type: application/json; charset=UTF-8 < Content-Length: 147 < * Connection #0 to host localhost left intact {"items":[{"bulk":{"_index":"user","_id":2811798919590182916,"created":4,"deleted":0,"updated":0,"result":"created","status":201}}],"errors":false} real 0m0.015s user 0m0.005s sys 0m0.004s ```

Minor changes

⚠️ Other minor breaking changes

New packages

The new structure is:
- manticore - deb/rpm meta package which installs all the above as dependencies
- manticore-server-core - searchd and everything to run it alone
- manticore-server - systemd files and other supplementary scripts
- manticore-tools - indexer, indextool and other tools
- manticore-common - default configuration file, default data directory, default stopwords
- manticore-icudata, manticore-dev, manticore-converter didn't change much
- .tgz bundle which includes all the packages

Bugfixes

Version 4.2.0, Dec 23 2021

Major new features

Pseudo sharding on vs off in 4.2.0

4.0.2

4.0.2
It takes **48 seconds to insert 1M PQ rules** and **406 seconds to REPLACE just 40K** in 10K batches.
root@perf3 ~ # mysql -P9306 -h0 -e "drop table if exists pq; create table pq (f text, f2 text, j json, s string) type='percolate';"; date; for m in `seq 1 1000`; do (echo -n "insert into pq (id,query,filters,tags) values "; for n in `seq 1 1000`; do echo -n "(0,'@f (cat | ( angry dog ) | (cute mouse)) @f2 def', 'j.json.language=\"en\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; [ $n != 1000 ] && echo -n ","; done; echo ";")|mysql -P9306 -h0; done; date; mysql -P9306 -h0 -e "select count(*) from pq"

Wed Dec 22 10:24:30 AM CET 2021
Wed Dec 22 10:25:18 AM CET 2021
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

root@perf3 ~ # date; (echo "begin;"; for offset in `seq 0 10000 30000`; do n=0; echo "replace into pq (id,query,filters,tags) values "; for id in `mysql -P9306 -h0 -NB -e "select id from pq limit $offset, 10000 option max_matches=1000000"`; do echo "($id,'@f (tiger | ( angry bear ) | (cute panda)) @f2 def', 'j.json.language=\"de\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; n=$((n+1)); [ $n != 10000 ] && echo -n ","; done; echo ";"; done; echo "commit;") > /tmp/replace.sql; date
Wed Dec 22 10:26:23 AM CET 2021
Wed Dec 22 10:26:27 AM CET 2021
root@perf3 ~ # time mysql -P9306 -h0 < /tmp/replace.sql

real    6m46.195s
user    0m0.035s
sys 0m0.008s
#### 4.2.0
4.2.0
It takes **34 seconds to insert 1M PQ rules** and **23 seconds to REPLACE them** in 10K batches.
root@perf3 ~ # mysql -P9306 -h0 -e "drop table if exists pq; create table pq (f text, f2 text, j json, s string) type='percolate';"; date; for m in `seq 1 1000`; do (echo -n "insert into pq (id,query,filters,tags) values "; for n in `seq 1 1000`; do echo -n "(0,'@f (cat | ( angry dog ) | (cute mouse)) @f2 def', 'j.json.language=\"en\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; [ $n != 1000 ] && echo -n ","; done; echo ";")|mysql -P9306 -h0; done; date; mysql -P9306 -h0 -e "select count(*) from pq"

Wed Dec 22 10:06:38 AM CET 2021
Wed Dec 22 10:07:12 AM CET 2021
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

root@perf3 ~ # date; (echo "begin;"; for offset in `seq 0 10000 990000`; do n=0; echo "replace into pq (id,query,filters,tags) values "; for id in `mysql -P9306 -h0 -NB -e "select id from pq limit $offset, 10000 option max_matches=1000000"`; do echo "($id,'@f (tiger | ( angry bear ) | (cute panda)) @f2 def', 'j.json.language=\"de\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; n=$((n+1)); [ $n != 10000 ] && echo -n ","; done; echo ";"; done; echo "commit;") > /tmp/replace.sql; date
Wed Dec 22 10:12:31 AM CET 2021
Wed Dec 22 10:14:00 AM CET 2021
root@perf3 ~ # time mysql -P9306 -h0 < /tmp/replace.sql

real    0m23.248s
user    0m0.891s
sys 0m0.047s

Minor changes

Breaking changes

Bugfixes

Version 4.0.2, Sep 21 2021

Major new features

- read operations (e.g. SELECTs, replication) are performed with snapshots - operations that just change internal index structure without modifying schema/documents (e.g. merging RAM segments, saving disk chunks, merging disk chunks) are performed with read-only snapshots and replace the existing chunks in the end - UPDATEs and DELETEs are performed against existing chunks, but for the case of merging that may be happening the writes are collected and are then applied against the new chunks - UPDATEs acquire an exclusive lock sequentially for every chunk. Merges acquire a shared lock when entering the stage of collecting attributes from the chunk. So at the same time only one (merge or update) operation has access to attributes of the chunk. - when merging gets to the phase, when it needs attributes it sets a special flag. When UPDATE finishes, it checks the flag, and if it's set, the whole update is stored in a special collection. Finally, when the merge finishes, it applies the updates set to the newborn disk chunk. - ALTER runs via an exclusive lock - replication runs as a usual read operation, but in addition saves the attributes before SST and forbids updates during the SST

Minor changes

3.6.0

3.6.0
time curl -X POST -d '{"update":{"index":"idx","id":4611686018427387905,"doc":{"mode":0}}}' -H "Content-Type: application/x-ndjson" http://127.0.0.1:6358/json/bulk

real    0m43.783s
user    0m0.008s
sys     0m0.007s
#### 4.0.2
4.0.2
time curl -X POST -d '{"update":{"index":"idx","id":4611686018427387905,"doc":{"mode":0}}}' -H "Content-Type: application/x-ndjson" http://127.0.0.1:6358/json/bulk

real    0m0.006s
user    0m0.004s
sys     0m0.001s

Breaking changes

Migration from Manticore 3

Bugfixes

Version 3.6.0, May 3rd 2021

Maintenance release before Manticore 4

Major new features

Minor changes

Optimizations

Bugfixes

Breaking changes:

Deprecations

Version 3.5.4, Dec 10 2020

New Features

Minor Changes

Deprecations

Bugfixes

Version 3.5.2, Oct 1 2020

New features

Minor changes

Deprecations:

Docker

The official Docker image is now based on Ubuntu 20.04 LTS

Packaging

Besides the usual manticore package, you can also install Manticore Search by components:

Bugifixes

  1. Commit 2a47 Crash of daemon at grouper at RT index with different chunks
  2. Commit 57a1 Fastpath for empty remote docs
  3. Commit 07dd Expression stack frame detection runtime
  4. Commit 08ae Matching above 32 fields at percolate indexes
  5. Commit 16b9 Replication listen ports range
  6. Commit 5fa6 Show create table on pq
  7. Commit 54d1 HTTPS port behavior
  8. Commit fdbb Mixing docstore rows when replacing
  9. Commit afb5 Switch TFO unavailable message level to 'info'
  10. Commit 59d9 Crash on strcmp invalid use
  11. Commit 04af Adding index to cluster with system (stopwords) files
  12. Commit 5014 Merge indexes with large dictionaries; RT optimize of large disk chunks
  13. Commit a2ad Indextool can dump meta from current version
  14. Commit 69f6 Issue in group order in GROUP N
  15. Commit 24d5 Explicit flush for SphinxSE after handshake
  16. Commit 31c4 Avoid copy of huge descriptions when not necessary
  17. Commit 2959 Negative time in show threads
  18. Commit f0b3 Token filter plugin vs zero position deltas
  19. Commit a49e Change 'FAIL' to 'WARNING' on multiple hits

Version 3.5.0, 22 Jul 2020

Major new features:

In plain mode it's called sql_field_string. Now it's available in RT mode for real-time indexes too. You can use it as shown in the example:

```sql
create table t(f string attribute indexed);
insert into t values(0,'abc','abc');
select * from t where match('abc');
+---------------------+------+
| id | f |
+---------------------+------+
| 2810845392541843463 | abc |
+---------------------+------+
1 row in set (0.01 sec)

mysql> select * from t where f='abc';
+---------------------+------+
| id | f |
+---------------------+------+
| 2810845392541843463 | abc |
+---------------------+------+
1 row in set (0.00 sec)
```

Minor changes

Breaking changes:

Deprecations:

3.4.2:
sql mysql> describe t; +-------+--------+----------------+ | Field | Type | Properties | +-------+--------+----------------+ | id | bigint | | | f | field | indexed stored | +-------+--------+----------------+

3.5.0:
sql mysql> describe t; +-------+--------+----------------+ | Field | Type | Properties | +-------+--------+----------------+ | id | bigint | | | f | text | indexed stored | +-------+--------+----------------+

Packages

Bugfixes:

  1. Issue #351 searchd memory leak
  2. Commit ceab Tiny read out of bounds in snippets
  3. Commit 1c3e Dangerous write into local variable for crash queries
  4. Commit 26e0 Tiny memory leak of sorter in test 226
  5. Commit d2c7 Huge memory leak in test 226
  6. Commit 0dd8 Cluster shows the nodes are in sync, but count(*) shows different numbers
  7. Commit f1c1 Cosmetic: Duplicate and sometimes lost warning messages in the log
  8. Commit f1c1 Cosmetic: (null) index name in log
  9. Commit 359d Cannot retrieve more than 70M results
  10. Commit 19f3 Can't insert PQ rules with no-columns syntax
  11. Commit bf68 Misleading error message when inserting a document to an index in a cluster
  12. Commit 2cf1 /json/replace and json/update return id in exponent form
  13. Issue #324 Update json scalar properties and mva in the same query
  14. Commit d384 hitless_words doesn't work in RT mode
  15. Commit 5813 ALTER RECONFIGURE in rt mode should be disallowed
  16. Commit 5813 rt_mem_limit gets reset to 128M after searchd restart
  17. highlight() sometimes hangs
  18. Commit 7cd8 Failed to use U+code in RT mode
  19. Commit 2b21 Failed to use wildcard at wordforms at RT mode
  20. Commit e9d0 Fixed SHOW CREATE TABLE vs multiple wordform files
  21. Commit fc90 JSON query without "query" crashes searchd
  22. Manticore official docker couldn't index from mysql 8
  23. Commit 23e0 HTTP /json/insert requires id
  24. Commit bd67 SHOW CREATE TABLE doesn't work for PQ
  25. Commit bd67 CREATE TABLE LIKE doesn't work properly for PQ
  26. Commit 5eac End of line in settings in show index status
  27. Commit cb15 Empty title in "highlight" in HTTP JSON response
  28. Issue #318 CREATE TABLE LIKE infix error
  29. Commit 9040 RT crashes under load
  30. cd512c7d Lost crash log on crash at RT disk chunk
  31. Issue #323 Import table fails and closes the connection
  32. Commit 6275 ALTER reconfigure corrupts a PQ index
  33. Commit 9c1d Searchd reload issues after change index type
  34. Commit 71e2 Daemon crashes on import table with missed files
  35. Issue #322 Crash on select using multiple indexes, group by and ranker = none
  36. Commit c3f5 HIGHLIGHT() doesn't higlight in string attributes
  37. Issue #320 FACET fails to sort on string attribute
  38. Commit 4f1a Error in case of missing data dir
  39. Commit 04f4 access_* are not supported in RT mode
  40. Commit 1c06 Bad JSON objects in strings: 1. CALL PQ returns "Bad JSON objects in strings: 1" when the json is greater than some value.
  41. Commit 32f9 RT-mode inconsistency. In some cases I can't drop the index since it's unknown and can't create it since the directory is not empty.
  42. Issue #319 Crash on select
  43. Commit 22a2 max_xmlpipe2_field = 2M returned warning on 2M field
  44. Issue #342 Query conditions execution bug
  45. Commit dd8d Simple 2 terms search finds a document containing only one term
  46. Commit 9091 It was impossible in PQ to match a json with capital letters in keys
  47. Commit 56da Indexer crashes on csv+docstore
  48. Issue #363 using [null] in json attr in centos 7 causes corrupted inserted data
  49. Major Issue #345 Records not being inserted, count() is random, "replace into" returns OK
  50. max_query_time slows down SELECTs too much
  51. Issue #352 Master-agent communication fails on Mac OS
  52. Issue #328 Error when connecting to Manticore with Connector.Net/Mysql 8.0.19
  53. Commit daa7 Fixed escaping of \0 and optimized performance
  54. Commit 9bc5 Fixed count distinct vs json
  55. Commit 4f89 Fixed drop table at other node failed
  56. Commit 952a Fix crashes on tightly running call pq

Version 3.4.2, 10 April 2020

Critical bugfixes

Version 3.4.0, 26 March 2020

Major changes

Minor changes

Features

Improvements

Bugfixes

Version 3.3.0, 4 February 2020

Features

Improvements

Bugfixes

Version 3.2.2, 19 December 2019

Features

Improvements and changes

Bugfixes

Version 3.2.0, 17 October 2019

Features

Improvements and changes

Bugfixes

Version 3.1.2, 22 August 2019

Features and Improvements

Bugfixes

Version 3.1.0, 16 July 2019

Features and Improvements

Removals

Bugfixes

Version 3.0.2, 31 May 2019

Improvements

Removals

Deprecations

Bugfixes

Version 3.0.0, 6 May 2019

Features and improvements

Behaviour changes

Removed directives

Version 2.8.2 GA, 2 April 2019

Features and improvements

Compiling notes

Cmake minimum version is now 3.13. Compiling requires boost and libssl
development libraries.

Bugfixes

Version 2.8.1 GA, 6 March 2019

Features and improvements

Bugfixes

Version 2.8.0 GA, 28 January 2019

Improvements

Bugfixes

Version 2.7.5 GA, 4 December 2018

Improvements

Bugfixes

Version 2.7.4 GA, 1 November 2018

Improvements

Bugfixes

Version 2.7.3 GA, 26 September 2018

Improvements

Bugfixes

Version 2.7.2 GA, 27 August 2018

Improvements

Bugfixes

Version 2.7.1 GA, 4 July 2018

Improvements

Bugfixes

Version 2.7.0 GA, 11 June 2018

Improvements

Bugfixes

Version 2.6.4 GA, 3 May 2018

Features and improvements

Bugfixes

Version 2.6.3 GA, 28 March 2018

Improvements

Bugfixes

Version 2.6.2 GA, 23 February 2018

Improvements

Bugfixes

Version 2.6.1 GA, 26 January 2018

Improvements

Bugfixes

Version 2.6.0, 29 December 2017

Features and improvements

Bugfixes

Upgrade

In this release we've changed internal protocol used by masters and agents to speak with each other. In case you run Manticoresearch in a distributed environment with multiple instances make sure your first upgrade agents, then the masters.

Version 2.5.1, 23 November 2017

Features and improvements

Bugfixes

Version 2.4.1 GA, 16 October 2017

Features and improvements

Compiling

Manticore Search is built using cmake and the minimum gcc version required for compiling is 4.7.2.

Folders and service

Bugfixes

Version 2.3.3, 06 July 2017

¶ 27

Reporting bugs

Unfortunately, Manticore is not yet 100% bug-free, although the development team is working hard towards that goal. You may encounter some issues from time to time.
It is crucial to report as much information as possible about each bug to fix it effectively.
To fix a bug, either it needs to be reproduced and fixed or its cause needs to be deduced based on the information you provide. To help with this, please follow the instructions below.

Bug-tracker

Bugs and feature requests are tracked on Github. You are welcome to create a new ticket and describe your bug in detail to save time for both you and the developers.

Documentation updates

Updates to the documentation (what you are reading now) are also done on Github.

Crashes

Manticore Search is written in C++, which is a low-level programming language that allows for direct communication with the computer for faster performance. However, there is a drawback to this approach as in rare cases, it may not be possible to elegantly handle a bug by writing an error to a log and skipping the processing of the command that caused the problem. Instead, the program may crash, resulting in it stopping completely and needing to be restarted.

When Manticore Search crashes, it is important to let the Manticore team know by submitting a bug report on GitHub or through Manticore's professional services in your private helpdesk. The Manticore team requires the following information:

  1. The searchd log
  2. The coredump
  3. The query log

Additionally, it would be helpful if you could do the following:
1. Run gdb to inspect the coredump:

gdb /usr/bin/searchd </path/to/coredump>
  1. Find the crashed thread ID in the coredump file name (make sure you have %p in /proc/sys/kernel/core_pattern), e.g. core.work_6.29050.server_name.1637586599 means thread_id=29050
  2. In gdb run:
set pagination off
info threads
# find thread number by it's id (e.g. for `LWP 29050` it will be thread number 8
thread apply all bt
thread <thread number>
bt full
info locals
quit
  1. Provide the outputs

What do I do when Manticore Search hangs?

If Manticore Search hangs, you need to collect some information that may be useful in understanding the cause. Here's how you can do it:

  1. Run show threads option format=all trough a VIP port
  2. collect the lsof output output, as hanging can be caused by too many connections or open file descriptors.
lsof -p `cat /var/run/manticore/searchd.pid`
  1. Dump the core:
gcore `cat /var/run/manticore/searchd.pid`

(It will save the dump to the current directory.)

  1. Install and run gdb:
gdb /usr/bin/searchd `cat /var/run/manticore/searchd.pid`

Note that this will halt your running searchd, but if it's already hanging, it shouldn't be a problem.
5. In gdb run:

set pagination off
info threads
thread apply all bt
quit
  1. Collect all the outputs and files and provide them in a bug report.

For experts: the macros added in this commit can be helpful in debugging.

How to enable saving coredumps on crash?

[root@srv lib]# systemctl set-environment _ADDITIONAL_SEARCHD_PARAMS='--coredump'
[root@srv lib]# systemctl restart manticore
[root@srv lib]# ps aux|grep searchd
mantico+  1955  0.0  0.0  61964  1580 ?        S    11:02   0:00 /usr/bin/searchd --config /etc/manticoresearch/manticore.conf --coredump
mantico+  1956  0.6  0.0 392744  2664 ?        Sl   11:02   0:00 /usr/bin/searchd --config /etc/manticoresearch/manticore.conf --coredump
echo "/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
[root@srv lib]# grep CORE /lib/systemd/system/manticore.service
LimitCORE=infinity

How do I install debug symbols?

Manticore Search and Manticore Columnar Library are written in C++, which results in compiled binary files that execute optimally on your operating system. However, when running a binary, your system does not have full access to the names of variables, functions, methods, and classes. This information is provided in separate "debuginfo" or "symbol packages."

Debug symbols are essential for troubleshooting and debugging, as they allow you to visualize the system state when it crashed, including the names of functions. Manticore Search provides a backtrace in the searchd log and generates a coredump if run with the --coredump flag. Without symbols, all you will see is internal offsets, making it difficult or impossible to decode the cause of the crash. If you need to make a bug report about a crash, the Manticore team will often require debug symbols to assist you.

To install Manticore Search/Manticore Columnar Library debug symbols, you will need to install the *debuginfo* package for CentOS, the *dbgsym* package for Ubuntu and Debian, or the *dbgsymbols* package for Windows and macOS. These packages should be the same version as the installed Manticore. For example, if you installed Manticore Search in Centos 8 from the package https://repo.manticoresearch.com/repository/manticoresearch/release/centos/8/x86_64/manticore-4.0.2_210921.af497f245-1.el8.x86_64.rpm , the corresponding package with symbols would be https://repo.manticoresearch.com/repository/manticoresearch/release/centos/8/x86_64/manticore-debuginfo-4.0.2_210921.af497f245-1.el8.x86_64.rpm

Note that both packages have the same commit id af497f245, which corresponds to the commit that this version was built from.

If you have installed Manticore from a Manticore APT/YUM repository, you can use one of the following tools:

to find a debug symbols package for you.

How to check the debug symbols exist?

  1. Find the build ID in the output of file /usr/bin/searchd:
[root@srv lib]# file /usr/bin/searchd
/usr/bin/searchd: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=2c582e9f564ea1fbeb0c68406c271ba27034a6d3, stripped

In this case, the build ID is 2c582e9f564ea1fbeb0c68406c271ba27034a6d3.

  1. Find symbols in /usr/lib/debug/.build-id like this:
[root@srv ~]# ls -la /usr/lib/debug/.build-id/2c/582e9f564ea1fbeb0c68406c271ba27034a6d3*
lrwxrwxrwx. 1 root root 23 Nov  9 10:42 /usr/lib/debug/.build-id/2c/582e9f564ea1fbeb0c68406c271ba27034a6d3 -> bin/searchd
lrwxrwxrwx. 1 root root 27 Nov  9 10:42 /usr/lib/debug/.build-id/2c/582e9f564ea1fbeb0c68406c271ba27034a6d3.debug -> usr/bin/searchd.debug

Uploading your data

To fix your bug, developers often need to reproduce it locally. To do this, they need your configuration file, table files, binlog (if present), and sometimes source data (such as data from external storages or XML/CSV files) and queries.

Attach your data when you create a ticket on Github. If the data is too large or sensitive, you can upload it to our write-only S3 storage at s3://s3.manticoresearch.com/write-only/. Here's how you can do it using the Minio client:
1. Install the client https://min.io/docs/minio/linux/reference/minio-mc.html#install-mc
For example on 64-bit Linux:

curl https://dl.min.io/client/mc/release/linux-amd64/mc \
  --create-dirs \
  -o $HOME/minio-binaries/mc
chmod +x $HOME/minio-binaries/mc
export PATH=$PATH:$HOME/minio-binaries/
  1. Add our s3 host (use full path to executable or change into its directory): cd $HOME/minio-binaries and then ./mc config host add manticore http://s3.manticoresearch.com:9000 manticore manticore
  2. Copy your files (use full path to executable or change into its directory): cd $HOME/minio-binaries and then ./mc cp -r issue-1234/ manticore/write-only/issue-1234 . Make sure the folder name is unique and best if it corresponds to the issue on GitHub where you described the bug.

DEBUG

DEBUG [ subcommand ]

The DEBUG statement is designed for developers and testers to call various internal or VIP commands. However, it is not intended for production use as the syntax of the subcommand component may change freely in any build.

To view a list of useful commands and DEBUG statement subcommands available in the current context, simply call DEBUG without any parameters.

mysql> debug;
+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
| command                                                                 | meaning                                                                                |
+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
| flush logs                                                              | emulate USR1 signal                                                                    |
| reload indexes                                                          | emulate HUP signal                                                                     |
| debug token <password>                                                  | calculate token for password                                                           |
| debug malloc_stats                                                      | perform 'malloc_stats', result in searchd.log                                          |
| debug malloc_trim                                                       | pefrorm 'malloc_trim' call                                                             |
| debug sleep <N>                                                         | sleep for <N> seconds                                                                  |
| debug tasks                                                             | display global tasks stat (use select from @@system.tasks instead)                     |
| debug sched                                                             | display task manager schedule (use select from @@system.sched instead)                 |
| debug merge <TBL> [chunk] <X> [into] [chunk] <Y> [option sync=1,byid=0] | For RT table <TBL> merge disk chunk X into disk chunk Y                                |
| debug drop [chunk] <X> [from] <TBL> [option sync=1]                     | For RT table <TBL> drop disk chunk X                                                   |
| debug files <TBL> [option format=all|external]                          | list files belonging to <TBL>. 'all' - including external (wordforms, stopwords, etc.) |
| debug close                                                             | ask server to close connection from it's side                                          |
| debug compress <TBL> [chunk] <X> [option sync=1]                        | Compress disk chunk X of RT table <TBL> (wipe out deleted documents)                   |
| debug split <TBL> [chunk] <X> on @<uservar> [option sync=1]             | Split disk chunk X of RT table <TBL> using set of DocIDs from @uservar                 |
| debug wait <cluster> [like 'xx'] [option timeout=3]                     | wait <cluster> ready, but no more than 3 secs.                                         |
| debug wait <cluster> status <N> [like 'xx'] [option timeout=13]         | wait <cluster> commit achieve <N>, but no more than 13 secs                            |
| debug meta                                                              | Show max_matches/pseudo_shards. Needs set profiling=1                                  |
| debug trace OFF|'path/to/file' [<N>]                                    | trace flow to file until N bytes written, or 'trace OFF'                               |
| debug curl <URL>                                                        | request given url via libcurl                                                          |
+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
19 rows in set (0.00 sec)

Same from VIP connection:

mysql> debug;
+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
| command                                                                 | meaning                                                                                |
+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
| flush logs                                                              | emulate USR1 signal                                                                    |
| reload indexes                                                          | emulate HUP signal                                                                     |
| debug shutdown <password>                                               | emulate TERM signal                                                                    |
| debug crash <password>                                                  | crash daemon (make SIGSEGV action)                                                     |
| debug token <password>                                                  | calculate token for password                                                           |
| debug malloc_stats                                                      | perform 'malloc_stats', result in searchd.log                                          |
| debug malloc_trim                                                       | pefrorm 'malloc_trim' call                                                             |
| debug procdump                                                          | ask watchdog to dump us                                                                |
| debug setgdb on|off                                                     | enable or disable potentially dangerous crash dumping with gdb                         |
| debug setgdb status                                                     | show current mode of gdb dumping                                                       |
| debug sleep <N>                                                         | sleep for <N> seconds                                                                  |
| debug tasks                                                             | display global tasks stat (use select from @@system.tasks instead)                     |
| debug sched                                                             | display task manager schedule (use select from @@system.sched instead)                 |
| debug merge <TBL> [chunk] <X> [into] [chunk] <Y> [option sync=1,byid=0] | For RT table <TBL> merge disk chunk X into disk chunk Y                                |
| debug drop [chunk] <X> [from] <TBL> [option sync=1]                     | For RT table <TBL> drop disk chunk X                                                   |
| debug files <TBL> [option format=all|external]                          | list files belonging to <TBL>. 'all' - including external (wordforms, stopwords, etc.) |
| debug close                                                             | ask server to close connection from it's side                                          |
| debug compress <TBL> [chunk] <X> [option sync=1]                        | Compress disk chunk X of RT table <TBL> (wipe out deleted documents)                   |
| debug split <TBL> [chunk] <X> on @<uservar> [option sync=1]             | Split disk chunk X of RT table <TBL> using set of DocIDs from @uservar                 |
| debug wait <cluster> [like 'xx'] [option timeout=3]                     | wait <cluster> ready, but no more than 3 secs.                                         |
| debug wait <cluster> status <N> [like 'xx'] [option timeout=13]         | wait <cluster> commit achieve <N>, but no more than 13 secs                            |
| debug meta                                                              | Show max_matches/pseudo_shards. Needs set profiling=1                                  |
| debug trace OFF|'path/to/file' [<N>]                                    | trace flow to file until N bytes written, or 'trace OFF'                               |
| debug curl <URL>                                                        | request given url via libcurl                                                          |
+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
24 rows in set (0.00 sec)

All debug XXX commands should be regarded as non-stable and subject to modification at any time, so don't be surprised if they change. This example output may not reflect the actual available commands, so try it on your system to see what is available on your instance. Additionally, there is no detailed documentation provided aside from this short 'meaning' column.

As a quick illustration, two commands available only to VIP clients are described below - shutdown and crash. Both require a token, which can be generated with the debug token subcommand, and added to the shutdown_token param in the searchd section of the config file. If no such section exists, or if the provided password hash does not match the token stored in the config, the subcommands will do nothing.

mysql> debug token hello;
+-------------+------------------------------------------+
| command     | result                                   |
+-------------+------------------------------------------+
| debug token | aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
+-------------+------------------------------------------+
1 row in set (0,00 sec)

The subcommand shutdown will send a TERM signal to the server, causing it to shut down. This can be dangerous, as nobody wants to accidentally stop a production service. Therefore, it requires a VIP connection and the password to be used.

The subcommand crash literally causes a crash. It may be used for testing purposes, such as to test how the system manager maintains the service's liveness or to test the feasibility of tracking coredumps.

If some commands are found to be useful in a more general context, they may be moved from the debug subcommands to a more stable and generic location (as exemplified by the debug tasks and debug sched in the table).

¶ 28

References

SQL commands

Schema management
Data management
Backup
SELECT
Flushing misc things
Real-time table optimization
Importing to a real-time table
Replication
Plain table rotate
Transactions
CALL
Plugins
Server status

HTTP endpoints

Common things

Common table settings
Plain table settings
Distributed table settings
RT table settings

Full-text search operators

Functions

Mathematical
Searching and ranking
Type casting
Arrays and conditions
Date and time
Geo-spatial
String
Other

Common settings in configuration file

To be put to section common {} in configuration file:

Indexer

indexer is a tool to create plain tables

Indexer settings in configuration file

To be put to section indexer {} in configuration file:

Indexer start parameters
indexer [OPTIONS] [indexname1 [indexname2 [...]]]

Table Converter for Manticore v2 / Sphinx v2

index_converter is a tool designed to convert tables created with Sphinx/Manticore Search 2.x into the Manticore Search 3.x table format.

index_converter {--config /path/to/config|--path}
Table converter start parameters

Searchd

searchd is the Manticore server.

Searchd settings in a configuration file

To be put in the searchd {} section of the configuration file:

Searchd start parameters
searchd [OPTIONS]
Searchd environment variables

Indextool

Assorted table maintenance features helpful for troubleshooting.

indextool <command> [options]
Indextool Start Parameters

Utilized for dumping various debug information related to the physical table.

indextool <command> [options]

Wordbreaker

Splits compound words into their components.

wordbreaker [-dict path/to/dictionary_file] {split|test|bench}
Wordbreaker start parameters

Spelldump

Extracts the contents of a dictionary file using ispell or MySpell format

spelldump [options] <dictionary> <affix> [result] [locale-name]

List of reserved keywords

A comprehensive alphabetical list of keywords currently reserved in Manticore SQL syntax (thus, they cannot be used as identifiers).

AND, AS, BY, COLUMNARSCAN, DATE_ADD, DATE_SUB, DAY, DISTINCT, DIV, DOCIDINDEX, EXPLAIN, FACET, FALSE, FORCE, FROM, HOUR, IGNORE, IN, INTERVAL, INDEXES, INNER, IS, JOIN, KNN, LEFT, LIMIT, MINUTE, MOD, MONTH, NOT, NO_COLUMNARSCAN, NO_DOCIDINDEX, NO_SECONDARYINDEX, NULL, OFFSET, ON, OR, ORDER, QUARTER, REGEX, RELOAD, SECOND, SECONDARYINDEX, SELECT, SYSFILTERS, TRUE, WEEK, YEAR

Documentation for old Manticore versions

¶ 28.1

References

SQL commands

Schema management
Data management
Backup
SELECT
Flushing misc things
Real-time table optimization
Importing to a real-time table
Replication
Plain table rotate
Transactions
CALL
Plugins
Server status

HTTP endpoints

Common things

Common table settings
Plain table settings
Distributed table settings
RT table settings

Full-text search operators

Functions

Mathematical
Searching and ranking
Type casting
Arrays and conditions
Date and time
Geo-spatial
String
Other

Common settings in configuration file

To be put to section common {} in configuration file:

Indexer

indexer is a tool to create plain tables

Indexer settings in configuration file

To be put to section indexer {} in configuration file:

Indexer start parameters
indexer [OPTIONS] [indexname1 [indexname2 [...]]]

Table Converter for Manticore v2 / Sphinx v2

index_converter is a tool designed to convert tables created with Sphinx/Manticore Search 2.x into the Manticore Search 3.x table format.

index_converter {--config /path/to/config|--path}
Table converter start parameters

Searchd

searchd is the Manticore server.

Searchd settings in a configuration file

To be put in the searchd {} section of the configuration file:

Searchd start parameters
searchd [OPTIONS]
Searchd environment variables

Indextool

Assorted table maintenance features helpful for troubleshooting.

indextool <command> [options]
Indextool Start Parameters

Utilized for dumping various debug information related to the physical table.

indextool <command> [options]

Wordbreaker

Splits compound words into their components.

wordbreaker [-dict path/to/dictionary_file] {split|test|bench}
Wordbreaker start parameters

Spelldump

Extracts the contents of a dictionary file using ispell or MySpell format

spelldump [options] <dictionary> <affix> [result] [locale-name]

List of reserved keywords

A comprehensive alphabetical list of keywords currently reserved in Manticore SQL syntax (thus, they cannot be used as identifiers).

AND, AS, BY, COLUMNARSCAN, DATE_ADD, DATE_SUB, DAY, DISTINCT, DIV, DOCIDINDEX, EXPLAIN, FACET, FALSE, FORCE, FROM, HOUR, IGNORE, IN, INTERVAL, INDEXES, INNER, IS, JOIN, KNN, LEFT, LIMIT, MINUTE, MOD, MONTH, NOT, NO_COLUMNARSCAN, NO_DOCIDINDEX, NO_SECONDARYINDEX, NULL, OFFSET, ON, OR, ORDER, QUARTER, REGEX, RELOAD, SECOND, SECONDARYINDEX, SELECT, SYSFILTERS, TRUE, WEEK, YEAR

Documentation for old Manticore versions